ISSN  0316-6295 


A  Study  of  Program 
And  Memory  Policy  Behaviour 


G.  Scott  Graham 

Computer  Systems  Research  Group 
University  of  Toronto 

Technical  Report  CSRG-77 
May  1977 


COMPUTER  SYSTEMS  RESEARCH  GROUP 

UNIVERSITY  OP  TORONTO 


lAR^OSJ 


A  Study  of  Program 
And  Memory  Policy  Behaviour 

by 

G.  Scott  Graham 

Computer  Systems  Research  Group 
University  of  Toronto 

Technical  Report  CSRG-77 
May  1977 


The  Computer  Systems  Research  Group  (CSRG)  is  an  interdisciplinary 
group  formed  to  conduct  research  and  development  relevant  to 
computer  systems  and  their  applications.  It  is  jointly  administered 
by  the  Department  of  Electrical  Engineering  and  the  Department  of 
Computer  Science  of  the  University  of  Toronto,  and  is  supported  in 
part  by  the  National  Research  Council  of  Canada. 


Digitized  by  the  Internet  Archive 
in  2018  with  funding  from 
University  of  Toronto 


https://archive.org/details/technicalreportc77univ 


Abstract 


This  report  is  a  condensed  version  of  the  author's  Ph.D.  thesis, 

"A  Study  of  Program  and  Memory  Policy  Behaviour".  Chapters  1  and  2  of 
the  thesis  have  been  replaced  by  the  paper,  "Mul ti programmed  Memory 
Management",  by  Peter  J.  Denning  and  G.  Scott  Graham. 

We  survey  memory  management  policies,  in  particular  the  Page  Fault 
Frequency  (PFF)  and  Denning  Working  Set  (DWS)  variable  partition 
policies.  We  show  that  PFF  is  subject  to  both  anomalous  and  gap 
behaviours,  not  exhibited  by  DWS.  These  behaviours  make  PFF  difficult 
to  control;  additional  experiments  on  adjusting  the  memory  policy 
parameter  confirm  this  observation.  Using  trace  tape  information,  we 
relate  certain  features  of  the  lifetime  function  and  space-time  cost 
for  DWS  and  PFF.  This  relationship  is  developed  further  in  a  queueing 
network  model,  where  we  investigate  the  lifetime  knee  heuristic  for 
optimal  system  throughput. 

We  survey  program  behaviour  models,  in  particular  the  phase/ 
transition  model.  Empirically  derived  policy-based  semi -Markov  models 
of  program  behaviour  are  developed,  but  are  not  found  to  be  useful  in 
reproducing  performance  measure  values.  We  propose  that  a  proper  model 
would  be  obtained  by  separating  a  program  reference  string  into  phases 
and  transitions. 


Notes 


The  page  numbering  in  the  Table  of  Contents  reflects  the  complete 
version  of  the  thesis,  with  Chapters  1  and  2  included. 

The  "Mul ti programmed  Memory  Management"  paper  is  copyright  (c) 
1975  by  The  Institute  of  Electrical  and  Electronics  Engineers  Inc. 
Reprinted,  with  permission,  from  Proceedings  of  the  IEEE,  June  1975, 
Vol.  63,  No.  6,  pp.  924-939. 


The  assistance  and  encouragement  of  Ken  Sevcik  and  Jim  Horning, 
present  and  past  Chairmen  of  the  Computer  Systems  Research  Group,  is 
appreciated. 

The  publication  of  this  report  was  supported  in  part  by  the 
Connaught  Fund,  University  of  Toronto,  and  by  the  National  Research 
Council  Grant  A9274. 


iii 


A  STUDY  OF  PROGRAM 
AND  MEMORY  POLICY  BEHAVIOUR 


A  Thesis 

Submitted  to  the  Faculty 

of 

Purdue  University 

by 

Gordon  Scott  Graham 

In  Partial  Fulfillment  of  the 
Requirements  for  the  Degree 

of 

Doctor  of  Philosophy 


December  1976 


IV 


ACKNOWLEDGEMENTS 


It  is  a  pleasure  to  express 
advisor,  Peter  J.  Denning,  for  his 
and  inspiration.  Much  of  whatever  is 
due  to  his  scholarship  and  insight;  w 
are  my  responsibility  alone.  I  also 
members  of  my  advisory  committee.  Her 
Saul  Rosen,  for  their  help. 


ray  appreciation  to  my 
constant  encouragement 
good  in  this  thesis  is 
hatever  errors  remain 
wish  to  thank  the  other 
bert  D,  Schwetinan  and 


I  am  grateful  to  Hisashi  Kobayashi  and  his  group, 
including  Mohamed  Ghanem,  for  the  opportunity  to  work  on 
program  behaviour  at  the  IBM  T.J.  Watson  Research  Center 
during  the  summer  of  1973.  The  reference  string  tapes, 
central  to  this  thesis,  were  gathered  at  that  time.  I  wish 
to  thank  Bob  Ingebrand  for  his  help  in  generating  the  trace 
tapes,  and  George  Cox  and  Nigel  Horspool  for  their 
subsequent  help  with  the  tapes.  Inge  Weber  deserves  thanks 
for  entering  much  of  this  thesis  through  the  text  editor. 


Many  people  have  helped  me,  in  a  variety  of  ways,  during 
my  long  graduate  career.  I  wish  to  acknowledge  Dennis 
Tsichritzis,  Jack  Cedarholm,  and  Frank  Friedman,  Pat  Hume 
and  Tom  Hull,  present  and  past  Chairmen  of  the  Department  of 
Computer  Science,  University  of  Toronto,  also  added  their 
own  special  brand  of  encouragement. 


This  work  was  partially  funded  by  Grant  GJ-41289  of  the 
National  Science  Foundation,  whose  support  is  appreciated. 


V 


TABLE  Of  CONTENTS 


PAGE 

LIST  OF  TABLES . . . .  vii 

LIST  OF  FIGURES . . .  ix 

ABSTRACT  . .  xii 

CHAPTER  1  -  INTRODUCTION  AND  OVERVIEW . 1 

1.1  Introduction  . 1 

1.2  Overview  . . 9 

CHAPTER  2  -  BACKGROUND  . 11 

2.1  Notation  and  Terminology  . . 11 

2.1.1  The  Computer  System  ....................  11 

2.1.2  Memory  Management  . . .  15 

2.1.3  Performance  Measures  . . . .  21 

2.2  Classification  of  Memory  Management  Policies  ..  34 

2.2.1  Manual  vs.  Automatic . . . .  34 

2.2.2  Fixed  Partitioning . . . .  39 

2.2.3  Variable  Partitioning  40 

2.3  Observations  about  Program  Behaviour  ..........  56 

2.3.1  Queueing  Network  Studies  56 

2.3.2  Variable  Locality  Size  .................  66 

2.4  Models  of  Program  Behaviour  ...................  74 

2.4.1  i.i.d.  Models  . . 76 

2.4.2  LRU  Models . . . . . .  81 

2.4.3  Markov  Models  . . 86 

2.4.4  Markov  Models  of  Class  V3  Policies  .....  87 

2.4.5  Semi-Markov  Models  .....................  89 

CHAPTER  3  -  EXPERIMENTAL  INVESTIGATION  OF  MEMORY 

POLICIES  . . . . .  9  2 

3.1  Introduction  . . 92 

3.1.1  The  Phase/Transition  Model  of  Program 

Behaviour  . . 92 

3.1.2  Relationships  among  Performance 

Measures  . . 93 

3.1.3  Criteria  for  Memory  Policy 

Controllers  . . 93 


VI 


3.2  Methodology  . . . . .  95 

3.3  A  Study  of  the  Phase/Transition  Model  ........  101 

3.3.1  Multiple  Knee  Behaviour  . . .  103 

3.3.2  Dominance  of  DSS  Knees . .  109 

3.3.3  Irremovable  Overshoot  . .  Ill 

3.3.U  Summary  . . . . .  119 

3.4  More  about  Performance  Measures  .  119 

3.4.1  The  Knee  Criterion  . . .  120 

3.4.2  Queueing  Network  Model  Description  ....  127 

3.4.3  Queueing  Network  Model  Results  ........  133 

3.5  More  about  Controller  Criteria . .  142 

CHAPTER  4  -  SEMI-MARKOV  MEMORY  DEMAND  MODELS  .  148 

4.1  Introduction  . . 148 

4.2  General  Description  ..........................  149 

4.3  Analysis  of  a  Policy  Model  Markov  Chain  .  151 

4.3.1  Mean  Memory  Demand  . . . .  151 

4.3.2  DWS  Paging  Fate  . . 152 

4.3.3  PFF  Paging  Rate  . . 153 

4.4  Holding  Time  Distribution  . . .  154 

4.5  Reference  String  Generation  . . . . .  156 

4.6  Model  Testing  and  Results  . .  157 

4.7  A  Better  Approach  . . 163 

CHAPTER  5  -  CONCLOSIONS  AND  FUTURE  WORK . . . .  165 

5.1  Discussion  . . . . . . . . .  165 

BIBLIOGRAPHY  . 167 

APPENDTCIES  . . 177 

Appendix  A  -  The  Trace  Monitor  . . 178 

Appendix  B  -  The  Data  Reduction  Programs .  180 

Appendix  C  -  The  Observed  Lifetime  Functions  .....  182 

Appendix  D  -  The  Observed  Space-time  Costs  .......  206 

Appendix  E  -  The  Parameter  Value  Extraction 

Programs  . .  223 

Appendix  F  -  The  Observed  Parameter  Values  .......  225 

Appendix  G  -  The  Synthetic  Reference  String 

Generation  Programs  .................  227 

Appendix  H  -  Goodness  of  Fit  between  Actual  and 

Synthetic  Lifetime  Curves  ...........  232 


vn 


LIST  OF  TABLES 

Table  Page 

3.2.1  Comments  on  the  programs  traced  96 

3.2.2  Relative  average  costs  of  the  data  reduction 

programs  . . . . . . .  99 

3-3. 1.1  DWS  multiple  knees  . . 106 

3.3. 1.2  PFF  multiple  knees  . 107 

3.3. 1.3  Investigation  of  phase  structure  . .  108 

3.3.2.  1  Dominance  of  DSS  lifetime  value  at  its  knee  .  110 

3.3. 2. 2  Dominance  of  DSS  lifetime  knee  slope  . .  112 

3.3.3. 1  DWS  and  PFF  overshoot  data  . .  118 

3.4. 1.1  Relationship  of  nth  level  lifetime  knees  and 

space-time  local  minima  for  D»S  . . .  122 

3.4. 1.2  Relationship  of  nth  level  lifetime  knees  and 

space-time  local  minima  for  PFF  - . .  123 

3. 4. 1.3  Relationship  of  nth  level  lifetime  knees  and 

space-time  local  minima  for  LRO  . 124 

3.4. 1.4  Space-time  global  minima  behaviour  ..........  126 

3.4. 3.1  DWS  and  PFF  loads  at  lifetime  knees  and 

maximum  throughputs  . 134 

3. 4. 3. 2  DWS  and  PFF  load  ranges  for  the  throughput 

plateaux  . . 136 

3.4. 3. 3  DWS  lifetime  knee  and  throughput 

plateaux  relationship  . .  137 

3. 4.3. 4  PFF  lifetime  knee  and  throughput 

plateaux  relationship . 138 

3.4. 3.5  LRO  lifetime  knee  and  throughput 

plateaux  relationship  .......................  139 

3.4,  3,6  Effect  of  S  on  the  knee  criterion  for 

reference  string  P6  and  the  DWS  policy  141 

4.6.1  Relative  percentage  difference  between 
actual  and  deterministic  microraodel 

synthetic  lifetime  curves . 161 

4.6.2  Summary  of  relative  percentage  difference 

between  actual  and  synthetic  lifetime  curves  162 

A. 1  TRAM  output  record  format  . .  179 

B. 1  Some  of  the  instruction  references 

deleted  by  STRIP  . 181 

B.2  DWS  mean  memory  sizes  for  the  standard 

window  sizes  . . 185 


B.3  PFF  mean  memory  sizes  for  the  standard 

threshold  window  sizes  . . 186 

B.4  VMIN  mean  memory  sizes  for  the  standard 

window  sizes  . . 187 

E.  1  Relative  costs  of  parameter  value  extraction 

programs  and  data  reduction  programs  .  224 

F. 1  Detailed  observed  parameter  values  . .  226 

G.  1  Relative  costs  of  the  synthetic  generation 

programs  ....................................  231 

H. 1  Relative  percentage  differences  between 

actual  and  synthetic  lifetime  curves  . .  233 

H.2  Synthetic  reproduction  of  actual  knee 

lifetimes  . . 243 

H.3  Synthetic  reproduction  of  number  of 

lifetime  knees  . . 244 


IX 


LIST  OF  FIGURES 

Figure  Page 

1.1.1  Memory  usage  during  separate  compilations  .....  4 

1.1.2  Program  execution  as  a  sequence  of  phases 

and  subphases  . . 5 

1.1.3  An  example  of  the  benefits  of  good  locality  ...  8 

2. 1.1.1  A  computer  system  model  ......................  12 

2.  1.1.2  A  queueing  network  view  of  a  computer  system  .  14 

2. 1.1. 3  The  effect  of  thrashing  . 16 

2. 1.2.1  Updating  the  stack  of  a  stack  algorithm  ......  20 

2. 1.2.2  The  LRU  stack  . . 22 

2. 1.2.3  The  OPT  priority  list  and  stack  . 23 

2. 1.3.1  The  page  fault  rate  function  . . 25 

2. 1.3.2  The  duty  factor  function  . . 26 

2. 1.3. 3  The  space-time  function  . . 29 

2. 1.3. 4  The  lifetime  function  . . 32 

2.2.1  Classification  of  memory  management  policies  ,  35 

2.2. 1.1  Running  time  vs.  memory  allocation  size  for 

an  automatic  policy  on  the  M44  computer  ......  37 

2.2. 1.2  Relative  benefits  of  improving  memory  policy 

and  of  improving  locality  . . 38 

2.2. 3.1  Convexity  and  the  lifetime  function  ..........  42 

2.2. 3.2  DWS  locality  set  estimation  for  a  window 

of  size  T . . . . . .  48 

2.2. 3.3  PFF  locality  set  estimation  for  a  threshold 

window  of  size  THRESH . .  51 

2.2. 3.4  General  structure  for  the  PFF  anomaly  . 53 

2. 2. 3. 5  Specific  example  of  the  PFF  anomaly . 54 

2.3. 1.1  lifetime  function  approximation  for  the 

convex  region  of  Pi  . .  58 

2.3. 1.2  Computer  system  model  of  Spirn  . . 60 

2. 3. 1.3  Computer  system  model  of  Sekino  . . 61 

2. 3. 1.4  Computer  system  model  of  Chamberlin  et  al.  ...  63 

2. 3. 1.5  Computer  system  model  of  Ghanem  . . 65 

2. 3. 2. 1  Dynamic  memory  size  changes  under  PFF  ........  67 

2.3. 2. 2  Dynamic  memory  size  changes  under  DWS  ........  68 

2. 3. 2. 3  Space-time  product  under  LRU  and  DWS  .........  70 

2. 3. 2. 4  Page  fault  rate  function  under  LRU, 

OPT,  and  DWS  . 71 

2. 3. 2. 5  Comparison  of  effects  . . 73 

2.4.  1.1  Page  fault  rate  prediction  of  RRM  . 77 

2.4.  1.2  Average  DWS  size  ys.  window  size  . .  80 


X 


3.2.1  Effect  of  reference  string  length  on  the 

DWS  PI  lifetime  . . . . .  97 

3.2.2  Sequencing  of  data  reduction  program 

execution  and  tape  creation  . . .  100 

3.2.3  Exact  and  approximate  DWS  space-time 

costs  for  P3  . .  102 

3.3. 1.1  Multiple  knee  features . . . .  105 

3. 3. 3.1  Reaction  of  VMIN  and  DWS  to  a  transition 

at  time  t . . . .  113 

3.3. 3. 2  Difference  in  resident  set  sizes  for 

VMIN  and  DWS  . . . .  115 

3.3.  3.3  Reaction  of  MDWS  to  a  transition  at  time  t  ..  116 
3.3. 3.4  Overshoots  of  DWS  and  PFF  at  their  knees  ....  117 

3.4. 1.1  Testing  the  dominance  of  the  DWS  knee 

lifetime  . . 128 

3. 4.  2.1  Queueing  network  model . . .  130 

3.4. 2. 2  5  and  10  percent  plateaux  on  the 

throughput  curve  . . 132 

3.5.1  5  and  10  percent  plateaux  on  the  space-time 

curve  . . 144 

3.5.2  DWS  window  sizes  for  5  percent  and  10 

percent  plateaux  on  the  space-time  curve  ....  145 

3.5.3  PFF  threshold  window  sizes  for  5  percent  and 

10  percent  plateaux  on  the  space- time  curve  .  146 

4.4.1  Example  of  the  poor  results  of  assuming 

a  St ate- independent  mean  holding  time  155 

4.6.1  DWS  lifetime  functions  for  the  actual  and  DWS 
deterministic  micromodel  P3  reference  string  159 

4.6.2  DWS  lifetime  functions  for  the  actual  and  PFF 

deterministic  micromodel  P3  reference  string  160 

B. 1  Data  structures  for  the  DWS  analyzer  ........  183 

C,  1  P1  lifetime  functions  for  OPT^LRO^and  DWS  ...  190 

C.2  P1  lifetime  functions  for  VMIN, DWS, and  PFF  ..  191 

C,3  P2  lifetime  functions  for  0PT,LRD,  and  DWS  ..  192 

C.4  P2  lifetime  functions  for  VMIN, DWS,  and  PFF  .  193 

C.5  P3  lifetime  functions  for  0PT,1RD,  and  DWS  ..  194 

C.6  P3  lifetime  functions  for  VMIN, DWS,  and  PFF  .  195 

C.7  P4  lifetime  functions  for  OPT,LRD,  and  DWS  .,  196 

C.8  P4  lifetime  functions  for  VMIN, DWS,  and  PFF  .  197 

C.9  P5  lifetime  functions  for  OPT, LRU,  and  DWS  ..  198 

C.10  P5  lifetime  functions  for  VMIN, DWS,  and  PFF  .  199 

C.11  P6  lifetime  functions  for  OPT,LRD,  and  DWS  ..  200 

C.12  P6  lifetime  functions  for  VMIN, DWS,  and  PFF  .  201 

C. 13  P7  lifetime  functions  for  OPT, LRU,  and  DWS  .,  202 

C.14  P7  lifetime  functions  for  VMIN, DWS,  and  PFF  .  203 

C. 15  P8  lifetime  functions  for  OPT, LRU,  and  DWS  .,  204 

C.16  P8  lifetime  functions  for  VMIN, DWS,  and  PFF  .  205 


XI 


D.  1 

PI 

space-time 

costs 

for 

OPT, LRU,  and 

DWS 

•  •  «  • 

207 

D.2 

P  1 

space-tim  e 

costs 

for 

DWS  and  PFF  , 

20  8 

D.3 

P2 

space-time 

costs 

for 

OPT,LRO,  and 

DWS 

«  •  •  • 

209 

D.  4 

P2 

space-time 

costs 

for 

DWS  and  PFF  , 

210 

D.  5 

P3 

space-time 

costs 

for 

OPT, LRU,  and 

DWS 

•  •  •  « 

211 

D.  6 

P3 

space-time 

costs 

for 

DWS  and  PFF  , 

212 

D.7 

P4 

space- time 

costs 

for 

OPT, LRU,  and 

DWS 

«  •  «  • 

213 

D.8 

P4 

space-time 

costs 

for 

DWS  and  PFF  , 

214 

D.9 

P5 

space- time 

costs 

for 

OPT, LRU,  and 

DWS 

•  *  •  • 

215 

D.  10 

P5 

space-time 

costs 

for 

DWS  and  PFF  , 

216 

D.  11 

P6 

space-time 

costs 

for 

OPT, LRU,  and 

DWS 

•  •  •  » 

217 

D.  12 

P6 

space- time 

costs 

for 

DWS  and  PFF  . 

218 

D.  13 

P7 

space-time 

costs 

for 

OPT, LRU,  and 

DWS 

«  •  *  • 

219 

D.  14 

P7 

space- time 

costs 

for 

DWS  and  PFF  , 

220 

D.  15 

P8 

space-time 

costs 

for 

OPT, LRU,  and 

DWS 

•  •  •  • 

221 

D.  16 

P8 

space-time 

costs 

for 

DWS  and  PFF  . 

222 

G.  1 

Data  flow  for 

the  GDL  program 

230 

H.  1 

P  1 

li f et ime 

fit 

for 

GDL 

under 

DWS _ ... 

H.2 

PI 

lifetime 

fit 

for 

GPL 

under 

DWS  ...... 

H.3 

PI 

lifetime 

fit 

for 

GL 

under 

DWS  . . 

H.4 

PI 

lif et ime 

fit 

for 

GDL 

under 

PFF  ...... 

H  .  5 

PI 

lifetime 

fit 

for 

GPL 

under 

PFF  ...... 

H.  6 

PI 

lifetime 

fit 

f  or 

GL 

under 

PFF _ ... 

H.7 

PI 

lifet ime 

fit 

for 

GDL 

under 

VFIN  ..... 

H  .8 

P  1 

lifetime 

fit 

for 

GPL 

under 

VMIN . 

H.9 

P1 

lifetime 

fit 

for 

GL 

under 

VMIN . 

.....  242 

xn 


ABSTRACT 


Graham,  Gordon  Scott.  Ph.D.,  Purdue  University,  December 
1976.  A  Study  of  Program  and  Memory  Policy  Behaviour.  Major 
Professor:  Peter  J.  Denning. 


This  thesis  is  an  invest iga 
policy  behaviour.  Ke  describe  s 
and  its  relation  to  performa 
study  of  program  behaviour  in  li 
model.  We  investigate  the  ab 
semi-Markov  policy  models  to  rep 


t ion  i 

n  to 

program  and  mem 

t  ud ies 

of 

program  behavi 

nee. 

We 

present  a  detai 

ght  of 

the 

phase/transit 

ility  of  empirically  deri 
reduce  locality  behaviour 


or  y 
our 
led 
ion 
ved 


Chapter  2  surveys  pro 
that  the  phase/transition 
model.  We  classify  memor 
three  variable  partition 
are  shown  to  perform  the 
allocation  with  program  b 
control.  The  Page  Fault 
to  be  subject  to  both  an 
exhibited  by  the  Denning 


gram 

be 

ha  V 

io 

u 

r  models  and  concludes 

rood 

el 

f  o 

rm 

s 

the  most  realis 

tic 

y  po 

lie 

ies 

i 

n 

to  fixed  partition 

and 

poll 

cie 

s. 

T 

h 

e  working  set  polic 

ies 

best 

,  b 

eca 

us 

e 

they  correlate  raem 

ory 

eha  V 

iou 

r  a 

nd 

have  automatic  1 

oad 

Freque 

ncy 

(PFF)  algorithm  is  sh 

own 

oraal 

ous 

a 

nd 

gap  behaviours. 

not 

Work 

inq 

Se 

t 

(DWS)  algorithm. 

Chapter  3  presents  results  of  detailed  experiments  on 
program  behaviour.  We  show  that  the  DWS  lifetime  function 
contains  considerable  information  on  phase/transition 
behaviour  such  as  nesting  among  program  phases.  There  are 
strong  correlations  between  lifetime  knees  and  space-time 
minima  for  many  memory  policies,  especially  for  the  DWS 
policy.  Queueing  network  studies  verify  for  actual  programs 
the  near  optimality  of  running  the  DWS  policy  with  its 
window  size  set  to  operate  at  its  lifetime  knee.  We  show 
how  to  estimate  the  irremovable  memory  allocation  overshoot 
of  nonlookahead  memory  policies  relative  to  the  optimal 
variable  partition  policy,  VMIN,  and  find  that  PFF  has  more 
overshoot  than  DWS,  Using  the  least  number  of  parameter 
values  reguired  to  cause  all  members  of  our  program  ensemble 
to  be  within  some  tolerance  of  their  space-time  minima  as  a 
measure,  we  find  the  PFF  policy  to  be  inherently  more 
difficult  than  DWS  to  control. 


Chapter  4  shows  that  empirically  derived  policy-based 
semi-Markov  models  are  not  useful  as  locality  macromodels. 
They  contain  too  little  information  for  allowing  a  synthetic 


xm 


generator  to  track  the  program’s  locality  set.  «e  find  that 
the  Least  Recently  Used  Stack  Model,  which  itself  performs 
poorly,  does  better  than  a  PFF  or  DWS  policy-based 
macromodei.  We  propose  that  a  proper  model  would  be 
obtained  by  separating  the  program  reference  string  into 
phases  and  transitions,  obtaining  separate  parameters  for  a 
phase  model  and  a  transition  model,  and  then  recomposing 
behaviour  daring  synthetic  generation. 


924 


PROCEEDINGS  Ol-  THE  IEEE,  VOL.  63,  NO.  6,  JUNE  1975 


Multiprogrammed  Memory  Management 

PETER  J.  DENNING,  senior  member,  ieee,  and  G.  SCOTT  GRAHAM 


Abstract- A.  queueing  network  is  used  to  show  that  the  page-fault-rate 
functions  of  active  programs  are  the  critical  factors  in  system  processing 
efGciency.  Properties  of  page-fault  functions  are  set  forth  in  terms  of  a 
locality  model  of  program  behavior.  Memory  management  policies  are 
grouped  into  two  fixed-partition  and  three  variable-partition  classes 
according  to  their  methods  of  allocating  memory  and  controlling  the 
multiprogramming  load.  It  is  concluded  that  the  so-called  working  set 
policies  can  be  expected  to  yield  the  lowest  paging  rates  and  highest 
processing  efficiency  of  all  the  classes. 

I.  Introduction 

LICITING  A  FULL,  or  even  adequate,  level  of  perfor¬ 
mance  from  a  multiprogrammed  computer  system  has 
proved  to  be  a  difficult  goal.  Much  to  the  regret  of  its 
designers,  many  a  system  was  put  together  with  only  exiguous 
concern  for  its  ultimate  behavior— perhaps  because  the  issues  of 

Manuscript  received  September  24,  1974;  revised  January  3,  1975. 
This  work  was  supported  in  part  by  the  National  Science  Foundation 
under  Grant  GJ-41269. 

P.  J.  Denning  is  with  the  Department  of  Computer  Science,  Purdue 
University,  West  Lafayette,  Ind.  47907. 

G.  S.  G.raham  is  with  the  Department  of  Computer  Science,  University 
of  Toronto,  Toronto,  Ont.,  Canada,  M5S  1 A7. 


system  organization  were  more  pressing  or  interesting,  or 
because  the  complexities  of  the  interactions  among  the  de¬ 
mands  of  different  programs  for  various  resources  were  under¬ 
estimated.  Particularly  vexing  have  been  a  variety  of  instability 
problems,  commonly  called  “thrashing”  [42],  and  the  in¬ 
ability  to  know  which  of  a  myriad  of  possibilities  is  the  most 
efficient  method  of  managing  a  system’s  memory  resources. 
Two  converging  streams  of  research  have  been  increasing  our 
knowledge  of  analysis  and  control  of  system  behavior;  their 
eventual  confluence  will  enable  the  design  of  new  systems 
whose  behavior  can  confidently  be  predicted,  and  may  enable 
improvements  in  existing  systems.  One  stream  comprises 
modeling  and  analysis  methods,  particularly  of  networks  of 
interacting  queues,  which  permit  studying  the  effects  of  com¬ 
peting  resource  demands  both  in  steady  and  transient  state. 
Though  steady-state  analysis  is  more  fully  developed  and  pro¬ 
vides  great  insight,  full  solutions  to  stability  problems  await 
the  development  of  transient-state  analyses  [26].  The  other 
stream  comprises  the  study  of  program  behavior  and  memory 
management,  that  is,  the  characterization  of  the  relationship 
between  observable  patterns  of  accessing  information  and 


Copyright  ©1975  by  The  Institute  of  Electrical  and  Electronics  Engineers,  Inc. 
Printed  in  U.S.A.  Annals  No.  506PR01 1 


DENNING  AND  GRAHAM:  MULTIPROGRAMMKD  MEMORY  MANAGEMENT 


92S 


demands  on  memory  and  other  system  resources,  and  their 
subsequent  use  in  designing  policies  of  memory  management. 
This  paper  surveys  the  present  state  of  knowledge  about  the 
interaction  of  these  two  streams. 

II.  System  Organization  and  Parameters 

A  great  many  contemporary  computer  systems  provide  each 
programmer  with  a  paged  virtual  address  space,  larger  than  the 
main  memory  space  likely  to  be  available  when  he  runs  his 
program.  They  also  provide  a  file  system  to  permit  program¬ 
mers  to  store  variable  numbers  of  variable  objects  (files)  for 
indefinite  periods  of  time.  We  assume  that  the  reader  is 
familiar  with  the  terminology  of  demand  paged  virtual 
memory  and  of  file  systems  (see,  for  example,  (14]  or  (36]). 
Most  such  systems  use  multiprogramming,  so  that  main 
memory  will  contain  a  supply  of  active  programs  to  which  the 
processor  can  be  switched  should  the  one  it  is  working  on 
stop.  Since  a  running  program  typically  stops  because  it 
requires  service  from  some  device  other  than  the  processor, 
multiprogramming  improves  concurrency  in  the  use  of  all 
system  resources. 

Fig.  1  depicts  the  type  of  multiprogramming  system  under 
consideration  here:  a  network  of  interacting  service  stations. 
The  network  comprises  two  main  portions:  the  active  network 
contains  the  processor  and  input/output  (I/O)  stations,  while 
the  passive  network  contains  a  job  queue  and  policies  for 
admitting  new  programs  to  active  status.  A  program  is  active 
when  in  the  active  network;  only  when  there  is  it  eligible  to 
receive  processing  and  I/O  service,  and  to  have  pages  in  main 
memory.  The  number  of  active  programs  is  called  the  level  or 
degree  of  multiprogramming;  it  is  denoted  hereafter  by  n  or, 
in  the  case  of  time  dependence,  by  n{t). 

In  Fig.  1,  each  active  program  is  waiting  for  service  from  one 
of  the  three  stations  in  the  active  network.  It  waits  at  the  file 
IfO  station  whenever  it  requires  one  or  more  records  of  a  file 
to  be  transferred  between  a  main  memory  buffer  and  the  file 
store  (usually  a  disk);  it  waits  at  the  paging  I/O  station  when¬ 
ever  it  requires  a  page  to  be  transferred  between  main  memory 
and  the  paging  store  (usually  a  drum);  and  otherwise  it  waits 
at  the  processor  station.  The  box  labeled  Job  Queue  contains 
a  set  of  enabled  programs,  a  decision  policy  for  activating 
them,  and  a  “load  control”  mechanism  for  controlling  n(r). 
New  programs  can  be  submitted  from  a  batch-processing  sys¬ 
tem  entry  station,  a  collection  of  time-sharing  terminals,  or 
both  [5]. 

Inherent  In  the  network  of  Fig.  1  is  the  notion  that  an  active 
program  alternates  between  intervals  of  requiring  processor 
service  and  intervals  of  requiring  an  I/O  transaction.  Though 
it  is  in  principle  possible  for  a  single  program  to  be  using  con¬ 
currently  both  the  processor  and  the  I/O  stations,  the  assump¬ 
tion  of  no  such  concurrency  is  frequently  met  in  practice: 
demand  paging  guarantees  disjointness  of  processing  and 
paging  I/O,  and  few  programmers  ever  achieve  more  than  a 
small  percentage  overlap  between  processing  and  file  I/O. 

At  the  completion  of  a  processing  interval,  a  program  moves 
to  the  file  I/O  station  with  probability  Qf,  to  the  paging  I/O 
station  with  probability  Qp,  or  to  the  inactive  state  with 
probability  ;  of  course 

<7/+  flp  +  qo  =  1.  (2.1) 

The  service  rates  of  the  three  stations  are  given  by  the  param¬ 
eters  bf,  bp,  and  bol  they  denote  the  reciprocals  of  the  mean 


N*«  T*rminot«d 
Program*  Program* 


Fig.  1.  Organization  of  multiprogramming  'vstem. 


service  times  at  their  respective  stations  (e.g.,  I /bo  is  the  mean 
length  of  a  processing  interval). 

The  network  parameters  Qf,  Qp,  and  Qq  are  derivable  from 
program  parameters.  Suppose  the  total  file  I/O,  paging  I/O, 
and  processing  requirements  of  a  program  are  denoted  by  Tf, 
Tp,  and  Tq,  respectively.  (In  our  context,  Tf  and  Tq  are 
entirely  intrinsic  to  a  program,  whereas  Tp  is  not;  how  much 
memory  is  allocated  to  a  program,  or  what  policy  is  used  to 
determine  which  of  a  program’s  pages  reside  in  main  memory, 
significantly  affects  Tp.  The  system  scheduling  and  memory 
management  policies  can  cause  Tp  to  vary  over  an  extremely 
wide  range,  from  considerably  smaller  than  Tq  to  considerably 
larger.)  Let  af  denote  the  rate  at  which  a  program  requests 
file  I/O;  its  total  number  of  file  I/O  requests  is  therefore  T^af, 
and  the  total  time  required  to  serve  all  of  them  is  Tf=  TQafjbf. 
Similarly,  if  Op  denotes  a  program’s  paging  rate,  the  total  time 
it  spends  on  paging  is  Tp  =  T^OpIbp.  Since  on  every  processor 
departure,  a  program  chooses  independently  to  leave  the  active 
network  with  probability  Qq,  the  mean  number  of  passes 
through  the  processor  before  leaving  is  l/^oi  aud  since  the 
mean  time  per  pass  is  assumed  to  be  l/b©. 


(2.2) 


Of  the  1/^70  passes  a  program  makes  on  the  processor,  (l/<7o)  ~  1 

of  them  were  occasions  on  which  it  moved  to  an  I/O  station 
(after  the  last  pass,  it  exited  active  status);  equating  this  to 
the  total  number  of  I/O  requests,  ro(ff/-+  Op),  we  find 


<7o  = 


1 

1  +  2’o(fl/  +  flp) 


(2.3) 


Together  with  (2.2),  this  implies  that  b^  =  aft- Op XJTq. 
Since  the  fraction  of  processor  passes  after  which  a  program 
moves  to  the  file  I/O  station  is  qf,  the  number  of  visits  it  makes 
there  must  be  q//<7o  =  which  implies 


qf=  ToOfqo  = 


Toof 


.  1  +  To(.af  +  ap) 


(2.4) 


926 


PROCEEDINGS  OF  THE  IEEE,  JUNE  1975 


Similarly, 


tions,  one  can  show  that 


Qp  ~ 


TpOp 

1  +  Tpiof^  ap) 


(2.5) 


It  is  clear  from  (2.3)-(2.5)  that  Qp  +  qp  =  1  as  required. 

If  we  now  extend  Tq,  af,  Op,  bf,  bp  to  be  averages  common 
to  all  active  programs,  we  can  use  the  parameter  values  implied 
by  the  preceding  equations  to  study  the  average  properties  of 
the  network. 

We  now  define  Up,  Uf,  and  Up  to  be  utilizations  (fraction 
of  time  busy)  of  the  three  stations,  for  given  load  and  param¬ 
eter  settings.  In  equilibrium,  the  mean  flow  of  programs  out 
of  the  file  I/O  station  must  be  Ufbf  programs  per  unit  time; 
out  of  the  paging  I/O  station.  Up  bp  ',  and  out  of  the  pro¬ 
cessor,  Uobo-  Moreover,  a  fraction  q/  of  t/o*o  must  be  input 
to  the  file  I/O  station  and,  in  equilibrium,  the  input  flow  must 
be  the  same  as  the  output  flow  there;  hence 


Ufbf=  Uoboqf=  Uobp(Toafqo)  =  UoQf  (2.6) 

where  (2.4)  and  (2.2)  have  been  used  to  simplify.  Define  the 
relative  utilization  of  the  file  I/O  station 

Rf=UflUp^aflbf.  (2.7) 

Similarly,  for  the  paging  I/O  station,  '■ 

Rp  =  UplUo=aplbp.  (2.8) 

The  relative  utilization  of  the  processor  station  is  of  course 

/?o  =  1-  It  is  important  to  nate  that  Rp  can  be  interpreted  as 
the  ratio  of  the  mean  paging  I/O  service  time  to  mean  unin¬ 
terrupted  processing  interval  between  page  faults.  (An  analo¬ 
gous  statement  can  be  made  for  Rf.)  In  other  words,  if 
S=  l/bp  is  the  mean  paging  I/O  service  time,  and  L  =  Ijop  is 
the  mean  length  of  execution  interval  between  page  faults 
(assuming  the  main  memory  access  time  is  used  for  the  unit  of 
virtual  time),  then  (2.8)  can  be  rewritten 


/?p  =  5/L.  (2.9) 

The  foregoing  discussion  assumes  that  all  active  programs 
have  the  same  system  parameters.  If  they  do  not,  we  can  use 
as  an  approximation  suitable  averages  over  all  active  programs. 
For  example,  if  a  sequence  of  k  successive  page  faults  (from 
programs  of  different  characteristics)  terminate  interfault 
intervals  of  expected  lengths  Li,  '  '  '  ,  L^,  we  can  use  L  = 
(Li  +  •  •  •  +  Li()/k  in  (2.9).  In  reality,  the  processor  utilization 
is  a  function  of  all  the  intervals  Ly,  -  •  •  ,Li^,  not  just  their 
average.  As  we  have  verified  by  simulations,  however,  the 
use  of  the  average  L  appears  to  give  predictions  of  utilization 
within  a  few  percent  of  the  true  utilization,  and  thus  we  felt 
justified  in  using  the  simpler  analysis  based  on  averages  over  the 
set  of  active  programs.  Nonetheless,  the  reader  should  keep  in 
mind  that  the  use  of  these  averages  in  fact  constitutes  an 
approximation. 

Concerning  utilizations,  a  few  points  should  be  noted.  First, 
the  ratio  Rf  depends  only  on  the  intrinsic  program  parameter 
Of  and  the  (fixed)  file  I/O  station  rate  bf  ',  it  cannot  be  affected 
by  memory  management  policies.  In  contrast,  the  ratio  Rp 
depends  on  the  paging  rate  Op,  which  can  be  controlled  by 
the  system.  Therefore,  in  our  context,  the  paging  rate  is  the 
critical  parameter.  Second,  the  relative  utilizations /?/, and 
Rp  do  not  depend  on  the  load  (level  of  multiprogramming). 
However,  the  absolute  utilizations  do.  Under  general  assump- 


Up  =  Up(n,Rf,Rp)  (2.10) 

that  is,  the  absolute  processor  utilization  depends  only  on  the 
load  and  the  relative  utilizations  (5),  [6],  [10].  Once  Uq  is 
found,  the  other  utilizations  can  be  obtained  from  Uf=  U^Rf 
and  Up  =  UoRp.  Third,  if  Rf  and  Rp  are  fixed,  Uq  must  be 
an  increasing  function  of  load,  for  a  new  active  program  must 
increase  the  absolute  utilization  of  any  station  at  which  it 
queues,  and,  because  the  utilizations  are  in  fixed  ratios,  all 
other  absolute  utilizations  must  increase.  Therefore, 

Upfn+\,Rf,Rp)>Up{n,Rf,Rp).  (2.11) 

Fourth,  as  load  increases,  the  utilization  of  the  station  having 
the  maximum  relative  utilization  must  approach  1  at  least  as 
fast  as  the  others.  Let  ^ 

/?»,  =  max  [/?/, /?p,  7?o] .  (2.12) 

I 

Since  Up  =  U^/R^,  the  fact  of  U^  approaching  I  fastest 
implies 

Up(n,Rf,Rp)<ilR^  (2.13) 

with  near  equality  for  large  enough  n.  Since  R^  >  1,  the 
maximum  possible  value  of  Up  may  in  fact  be  less  than  1.  This 
shows  that  the  designer  of  a  system  which  apparently  is  unable 
to  achieve  processor  utilization  1  cannot  immediately  con¬ 
clude  that  an  improvement  in  the  memory  management  policy 
will  increase  Uq.  A  slow  file  I/O  station,  or  excessive  rate  of 
file  I/O  requests,  can  cause  Rf,,  =  Rf>  1.  In  this  case,  the  file 
I/O  station,  rather  than  the  paging  I/O  station,  limits  processor 
utilization.  Usually,  however,  adequate  buffering  keeps  7?/ <  I, 
so  that  reductions  in  Rp  are  likely  to  improve  performance. 

These  properties  show  what  happens  when  load  is  changed 
and  other  parameters  are  held  fixed.  In  studying  memory 
policies,  it  is  frequently  possible  to  vary  the  paging  rate  while 
holding  load  and  other  parameters  fixed.  From  the  foregoing, 
it  follows  that 

Upin,Rf,Rp)>Up(n,Rf,Rp),  Rp<Rp.  (2.14) 

In  words,  changing  the  paging  rate  from  Cp  to  a'p  Kap  cannot 
decrease  processor  utilization.  For  if  reducing  Rp  were  to 
cause  Uq  to  decrease,  then  Uf=  UpRf  would  decrease  as 
well— implying  a  decrease  in  the  utilizations  of  all  stations, 
which  is  patently  impossible  without  reducing  the  load. 

Since  the  throughput  rate  of  the  active  network  of  the  system 
is  the  flow  out,  viz., 

\=Upboqp  =  Uo/To  (2.15) 

it  follows  that  increasing  processor  utilization  for  a  given  level 
of  multiprogramming  improves  the  system’s  ability  to  com¬ 
plete  work  at  that  load  level.  For  multiprogramming  level  n, 
Little’s  formula  tells  that  the  response  time  in  the  active  set  is 

W  =  ^  =  nTp/Uo  (2.16) 

that  is,  increasing  Uq  without  changing  n  will  decrease  response 
time.  Therefore,  decreasing  the  relative  utilization  of  the 
paging  I/O  station  by  an  improvement  in  memory  policy  with¬ 
out  changing  the  load  wilt  concommitantly  increase  throughput 
and  decrease  response  time.  For  this  reason,  processor  utiliza¬ 
tion  is  a  suitable  measure  of  performance. 


DENNING  AND  GRAHAM:  MULTIPROGRAMMED  MEMORY  MANAGEMENT 


927 


The  previous  observations  about  processor  utilization  do  not 
consider  what  happens  when  an  increase  in  load  implies  an 
increase  in  page-fault  rate  because  of  programs  having  less 
space  available.  Systems  under  memory  constraint  exhibit  an 
optimal  level  of  multiprogramming,  Hq,  with  Uq  maximum  at 
tiq  [5],  The  reason  is  that  overall  paging  rateap  is  an  implicit 
function  of  load,  with  ap{n  +  1 )  >  ap(n).  However,  for  n  <  riQ, 
the  increase  in  paging  is  unable  to  offset  the  increase  of 
utilization  effected  by  increased  load;  but  for  n'^riQ,  paging 
increases  more  rapidly  and  utilization  decreases.  An  extreme 
case  will  illustrate.  Suppose  total  main  memory  is  M  pages 
and  each  active  program  receives  space  x  =  Min  under  load  n. 
Take  Rf  =  1  and  Rp  to  be  t^e  step  function 


~  Op{x)lbp  — 


100, 

1/100, 


X  <Xo 
X  >Xo. 


In  other  words,  these  programs  page  at  a  high  rate  when  their 
memory  allocations  are  small,  and  at  a  low  rate  otherwise. 
This  implies  that  the  processor  utilization  has  the  form  [cf. 
(2.13)] 


Uoin,Rf,Rp)  = 


{Uoin,  1. 
\Uo(n,  1, 


1/100)  <  1/f?^  =  1,  nKMIxo 
100)  <  1//?^  =  1/100,  n>Mlxo. 

This  is  suggested  in  Fig.  2.  The  optimal  degree  of  multipro¬ 
gramming  is  no  =Af/xo.  The  effect  suggested  here,  known  as 
thrashing,  is  not  usually  so  abrupt  as  this  example  shows. 
However,  in  many  practical  situations,  changing  the  load 
from  no  to  Hq  +  1  or  wq  +  2  is  sufficient  to  cause  a  serious 
drop  in  utilization. 

The  optimal  level  of  multiprogramming  can  vary  from  one 
set  to  active  programs  to  another,  because  page-fault  rates  vary 
among  programs:  thus  no  =noit).  To  avoid  thrashing,  it  is 
necessary  to  include  a  load  control  mechanism  in  the  system 
scheduler  (Job  Queue  in  Fig.  1),  whose  purpose  is  to  adjust 
dynamically  the  level  of  multiprogramming  so  that  most  of  the 
time  n(t)  ^  kno(t)  for  some  small  constant  \  .  Even  a 
simple  limit  N  on  n(t)  may  not  successfully  control  thrashing, 
unless  N  has  been  set  low  enough  so  that  the  event  kno(t)  <  N 
is  unlikely— but  then  the  system  is  probably  operating  at  a 
significantly  suboptimal  load  a  goodly  portion  of  the  time. 
Load  controls  which  attempt  to  maintain  n(t)  =  no(t)  and 
which  thereby  keep  the  system  operating  at  top  efficiency 
will  be  discussed  later.  (See  also  [32]  and  [42].) 

In  summary,  we  have  examined  a  network  representation  of 
the  resources  used  by  active  programs  in  a  typical  multipro¬ 
gramming  environment.  The  purpose  was  to  establish  that  the 


page-fault  rate  is  the  critical  parameter,  and  that  memory 
policy  changes  that  improve  it  without  changing  load  or  other 
system  parameters  can  be  expected  to  improve  processor 
utilization,  increase  throughput,  and  decrease  response  time. 
To  show  whether  a  proposed  change  in  the  memory  policy 
will  improve  processing  efficiency,  it  is  usually  sufficient  to 
show  that  the  change  does  not  increase  any  program’s  paging 
rate  or,  equivalently,  that  it  decreases  the  relative  utilization  of 
the  paging  I/O  station.  We  also  showed  that  there  is  an 
optimum  load,  that  a  load  control  mechanism  is  required  to 
prevent  thrashing,  and  that  load  control  must  be  coupled  to 
the  memory  policy. 

III.  Program  Behavior  and  Parameters 

A  program  in  execution  will  generate  a  sequence  of  references 
(known  as  an  address  trace)  to  information  in  its  virtual 
address  space.  T  ’.ie  reference  string  of  the  program  is  a  sequence 

/?=r(l)r(2)---r(k)--T(A')  (3.1) 

in  which  r{k)  is  the  number  of  the  page  containing  the  virtual 
address  references  at  time  k,  where  k  =  1,  2,  •  •  ■  ,  A  measures 
execution  time,  or  virtual  time.  The  pages  the  program  has 
present  in  main  memory  constitute  its  resident  set;  the  resident 
set  just  before  the  kth  reference  is  denoted  by  Z{k),  and  its  size 
(in  pages)  by  z{k).  A  page  fault  occurs  at  virtual  time  k  if  r{k) 
is  not  in  Z{k).  Under  the  assumption  of  demand  paging, 
Z(k  +  1)  is  the  same  as  Z{k)  plus  r{k),  less  any  pages  of  Z{k) 
replaced  (that  is,  removed  from  main  memory)  by  the  memory 
policy;  moreover,  z{k  1 )  <  z{k)  +  1 .  The  memory  policy  thus 
determines  the  sequence  of  resident  sets  Z(1)Z(2)  •  •  •  Z(A) 
that  arises  while  processing  a  reference  string  R  and,  hence, 
the  paging  rate  experienced  by  the  program  generating  /?. 

Let  ti,  t2,  '  '  ’  ,  tj^  denote  the  (real)  time  instants  at  which 
the  references  of  a  reference  string  R  commence.  The  resident 
set  at  time  t,  where  ^  t  <  t/^,  is  the  same  as  that  at  time 
ffc_i ,  less  any  pages  which  have  been  replaced;  thus 

Z(/fc)CZ(OCZ(tfc_,)-hr(fc- 1).  (3.2) 

It  is  important  to  keep  the  distinction  clear:  the  behavior  of  a 
given  program  is  formulated  with  respect  to  its  virtual  time, 
whereas  the  behavior  of  a  system  is  forihulated  with  respect 
to  real  time. 

For  reasons  already  discussed,  the  page-fault-rate  function  is 
important  in  any  study  of  memory  management.  Denoted  by 
f{A,x),  this  function  gives  the  expected  number  of  page  faults 
generated  per  unit  of  virtual  time  when  a  given  reference 
string  R  is  processed  by  memory  policy  A,  subject  to  main 
memory  space  constraint  x.  Since  most  of  the  results  depend 
only  on  properties  which,  being  common  to  most  fault-rate 
functions,  are  relatively  independent  of  the  particular  R  that 
arises,  R  will  not  be  shown  as  an  explicit  parameter  of  these 
functions;  however,  the  dependence  should  not  be  forgotten 
altogether. 

For  the  case  of  fixed  memory  allocation,  the  space  constraint 
X  is  interpreted  to  mean  that  the  resident  set  sizes  must 
satisfy  z^k)  <  x  for  all  virtual  times  k.  For  the  case  of  variable- 
space  allocation,  the  space  constraint  x  is  interpreted  to  mean 
that  the  average  resident  set  size  is  x: 

1  ^ 

^  =  (3.3) 

^  *=1 


928 

It  is  assumed  that  the  policy  A  has  parameters  which  can  be 
adjusted  so  that  (3.3)  can  be  satisfied  for  a  range  of  choices  of 
X.  Examples  of  both  fixed-  and  variable-space  policies  will  be 
considered  below. 

Examples  of  commonly  studied  fixed-space  policies  include 
the  following:  least  recently  used  (LRU)  which,  at  a  page-fault 
time,  replaces  the  least  recently  referenced  page  of  the  resident 
set;  first  in,  first  out  (FIFO)  which,  at  a  page-fault  time,  re¬ 
places  the  longest  resident  page;  random  (RAND)  which,  at  a 
page-fault  time,  replaces  a  randomly  chosen  page  from  the 
resident  set;  and  optimal  (OPT)  which,  at  a  page-fault  time, 
replaces  the  resident  set  page  that  will  not  be  referenced  again 
for  the  longest  time.  Of  these,  OPT  cannot  be  implemented 
(it  requires  foreknowledge),  FIFO  is  simplest  to  implement  (it 
requires  arranging  the  resident  set  pages  in  an  order-of-arrival 
queue),  and  LRU  is  the  most  robust,  providing  consistently 
the  lowest  (of  non-OPT  policies)  fault  rate  over  the  widest 
class  of  reference  strings  [  1  ] .  Although  OPT  is  not  imple- 
mentable,  it  can  be  used  a  posteriori  to  compare  various 
algorithms  against  optimum;  and  its  principle-choosing  for 
replacement  the  page  with  maximum  “forward  distance”— can 
easily  be  used  to  construct  reference  strings  for  which  LRU  is 
optimal  or  approximately  so  (see  Appendix  I).  If  the  memory 
policy  A  is  a  member  of  the  large  class  of  “stack  algorithms” 
[10],  [24],  the  fault-rate  function  f{A,x)  is  nonincreasing  in 
X  for  every  reference  string,  and  may  be  computed  by  a  highly 
efficient  procedure.  Of  the  foregoing,  all  but  FIFO  are  stack 
algorithms. 

Much  of  our  attention  will  be  directed  toward  the  LRU 
policy,  or  procedures  resembling  it.  Associated  with  an  in¬ 
stance  of  this  policy  is  a  dynamic  list  known  as  the  LRU  stack, 
which  arranges  the  referenced  pages  from  top  to  bottom  by 
decreasing  recency  of  reference.  At  a  page  replacement  time, 
the  LRU  policy  chooses  the  lowest  ranked  page  in  the  stack; 
therefore,  the  contents  of  an  x-page  resident  set  must  always 
be  the  pages  occupying  the  first  x  stack  positions.  When  a  page 
is  referenced,  the  stack  is  updated  by  moving  the  referenced 
page  to  the  top  and  pushing  down  the  intervening  pages  by 
one  place.  The  position  at  which  the  referenced  page  was 
found  in  the  stack  before  being  promoted  to  the  top  is  called 
its  stack  distance.  A  page  fault  occurs  in  an  x-page  resident  set 
at  a  given  reference,  if  and  only  if  the  stack  distance  of  that 
reference  exceeds  x.  These  ideas  form  the  basis  of  an  efficient 
procedure  for  computing  the  fault-rate  function  /(LRU,  x)  by 
by  counting  stack  distances  in  a  reference  string  (see  Appendix 
I).  Fig.  3  shows  a  typical  fault-rate  function.  It  has  the  ter¬ 
minal  values  /(LRU,  0)=  1  and  f(LR\J,N)  =  N/K  for  an  N- 
page  program  and  reference  string  of  length  K.  For  large  K, 
the  function  is  typically  convex,  which  is  considered  a  mani- 
1  festation  of  program  locality  (see  below). 

Though  powerful,  analysis  of  given  reference  strings  under 
I  fixed-space  policies  does  not  account  for  the  mechanisms  by 
I  which  programs  generate  reference  strings;  inoreover,  the 
procedures  do  not  readily  extend  to  the  analysis  of  variable- 
^  space  policies.  To  deal  with  this,  a  model  is  useful.  Regard  a 
:  program’s  execution  time  as  being  partitioned  into  a  sequence 
of  phases,  a  phase  being  an  interval  of  constant  memory 
i  requirement.  Similarly,  the  program’s  address  space  is  parti¬ 
tioned  into  segments,  a  segment  being  a  named  block  of  con¬ 
tiguous  addresses.  A  given  segment  is  considered  “active”  in  a 
given  phase  if  processing  of  that  phase  requires  the  presence 
of  that  segment  in  main  memory,  that  is,  in  the  resident  set. 
The  set  of  ail  segments  active  in  a  given  phase  is  called  the 


PROCEEDINGS  OF  THE  IEEE,  JUNE  1975 

pegti/tlm* 


Fig.  3.  Typical  LRU  fault-rate  function. 

locality  set,  or  locality,  of  that  phase;  the  locality  set  at  ^ 
given  instant  of  real  time  is  the  same  as  that  of  the  phase  in 
progress  at  that  time  [38],  [40].  The  validity  of  this  abstrac¬ 
tion  has  been  verified  over  and  over  again,  for  example,  in  the 
experiments  of  Rodriquez-Rosell  [31],  Hatfield  and  Gerald 
[23],  or  Ferrari  [19].  It  is  always  observed  t^^at  many  dis¬ 
tinct  and  readily  identified  phases  exist,  that  during  each  an 
often  small  subset  of  the  program’s  segments  is  active,  and 
that  the  locality  sets  are  often  disjoint  and  of  highly  variable 
sizes. 

Though  not  of  direct  concern  here,  the  distribution  of  seg¬ 
ments  among  pages  can  have  a  significant  effect  on  locality 
[19],  [23].  In  case  a  large  segment  is  allocated  among  several 
smaller  pages,  its  activity  implies  that  of  all  its  pages,  so  that 
the  original  locality  properties  remain  observable.  In  case  a 
number  of  small  segments  are  allocated  on  one  larger  page, 
the  assignment  can  be  critical.  Scattering  the  segments  of  one 
locality  among  many  pages  will  effectively  mask  the  locality 
properties  of  the  original  program,  making  it  appear  as  if  the 
locality  set  of  every  phase— measured  now  in  pages— is  very 
nearly  all  the  address  space. 

The  notion  of  localities  and  program  phases  is  somewhat 
more  viable  in  the  context  of  generating  page  reference  strings 
[38],  [40]  than  in  the  context  of  designing  memory  policies, 
simply  because  prior  knowledge  of  localities  and  phase  bound¬ 
aries  is  not  available  in  the  latter  context.  Instead,  a  memory 
policy  must  include  some  method  of  measuring  or  estimating 
the  locality  of  a  program  at  each  instant,  and  the  estimator, 
thus  defined,  can  be  used  to  specify  the  content  (and  the  size, 
if  that  is  adjustable)  of  the  resident  set  of  a  program.  The 
generic  term  working  set  is  usually  used  to  denote  an  estimator 
of  a  locality  set.  Just  as  there  is  a  wide  range  of  fixed-space 
paging  algorithms,  there  is  a  wide  range  of  locality  estimating 
techniques  and  a  range  of  variable-space  policies  based  on 
them.  A  characterization  of  this  range  will  be  presented  in 
Section  IV. 

Perhaps  the  best  known  locality  estimation  method  is  the 
moving-window  working  set.  Its  analysis  methods  are  even 
more  fully  developed  than  those  for  fixed-space  paging  al¬ 
gorithms  [10],  [12],  [13],  [15],  [22],  [28],  [37].  Fora 
parameter  T  known  as  the  window  size,  the  working  set 
W{k,  T)  of  a  program  at  virtual  time  k  is  the  collection  of 
pages  referenced  by  that  program  in  the  T  references  pre¬ 
ceding  and  including  the  one  at  time  k  (that  is,  W(k,T)  = 
{r(k  -  T  +  ]),■■■  ,  r(k)}y,  if  k<T,  W(k,  T)  =  W{k,  k).  The 
size  of  the  working  set  is  denoted  by  w(k,  T).  If  {tj^}  are  the 
real-time  instants  corresponding  to  virtual  times  {k},  and 
+  define  Wit,  T)  =  lV(k,  T).  The  missing-paee 


DENNING  AND  GRAHAM:  MULTIPROGRAMMED  MEMORY  MANAGEMENT 


929 


pog«( 


window 

(ixt 


window 

(iio 


pogo 


Fig.  4.  Working  set  properties,  and  construction  of  fault-rate  function. 


rate  m{T)  is  the  rate  at  which  new  pages  are  entering  the  work¬ 
ing  set.  Under  the  pure  working-set  memory  policy  (WS), 
which  allocates  the  resident  set  as  Z{t)=  W{t,  T),  m(T)  be¬ 
comes  the  page-fault  rate.  The  mean  working  set  size  is 
denoted  5(7’);  it  is  an  increasing  concave  function  whose  slope 
may  be  interpreted  as  m(T)  (see  Appendix  I).  A  fault-rate 
function  /(WS,  x)  giving  directly  the  relation  between  (mean) 
space  and  paging  rate  can  be  defined  parametrically  by  setting 

ms,s(T))  =  m{T),  7’=0,  1,2,---.  (3.4) 

All  the  functions  m(T),  s(T),  and  /(WS,x)  can  be  computed 
efficiently  (see  Appendix  1). 

Fig.  4  shows  the  mean  working-set-size  function  s(T),  the 
missing-page-rate  function  m(T),  and  the  construction  of  the 
fault-rate  function  /(WS,x).  The  curve  5(7)  approaches  a 
value  5n,ax>  which  it  attains  for  some  T  <  A',  where  K  is  the 
reference  string  length.  In  general,  5n,ax  ^  being  the 
program  size;,  however,  5n,ax  ^^ed  not  equal  since  the  pro¬ 
gram  need  not  reference  some  of  its  pages  until  later  phases, 
whereupon  early  working  set  sizes  must  be  less  than  N.  The 
function  miT)  is  decreasing  to  the  value  N/K.  The  fault-rate 
function  /(WS,x)  is  defined  only  for  0<x<5max.  with 
terminal  values  /(WS,  0)  =  1  and  /(WS,  s^^ax)  ~  is  de¬ 

creasing  since  s(T)  is  increasing  and  m(T)  is  decreasing. 

Fig.  5  presents  a  typical  comparison  of  /(WS,  x)  and 
/(LRU,  x).  Define  the  point  Xq  as  the  smallest  space  for 
which  x'^Xq  implies  /(WS,  x)  </(LRU,  x),  that  is,  WS  is  at 
least  as  good  as  LRU.  The  point  Xq  appears  not  to  exceed  the 
mean  locality  set  size  of  the  reference  string.  Thus  a  program 
with  one  phase  and  one  locality  set  will  have  Xq  —  5niax 
while  a  program  with  many  phases  and  a  wide  variance  among 


pogti/timt 


Fig.  S.  Comparison  of  LRU  and  WS  fault-rate  functions. 


tim« 


locality  set  sizes  will  tend  to  have  Xq  much  smaller  than  5n,ax 
and  N.  The  reason  is  that  WS  is  able  to  adapt  its  resident  set 
to  be  the  current  locality  set  estimate,  havmg  little  or  no  paging 
whenever  the  window  is  contained  wholly  in  a  phase,  whereas 
LRU  will  produce  streams  of  page  faults  in  those  phases  whose 
locality  exceeds  the  size  of  its  resident  set.  This  behavior  will 
be  observed  even  for  reference  strings  over  which  LRU  is 
optimal,  and  for  reference  strings  processed  by  the  optimal 
fixed-space  policy  OPT;  it  is  a  direct  result  of  the  variability 
of  locality  size. 

Experiments  by  Prieve  and  Fabry  [28],  [29]  indicate  that 
differences  of  30  percent  or  more  between  the  LRU  and  the 
WS  curves  in  the  range  x  >  Xq  occur  frequently,  showing  that 
important  variations  in  locality  size  are  significant  in  program 
behavior.  (See  Appendix  I  for  examples.)  Further  experiments 
demonstrate  that  an  optimal  variable-space  algorithm  could  in 
principle  produce  another  30  percent  or  more  improvement 
over  WS  (60  percent  or  more  over  LRU),  showing  that  the 
working  set  is  not  a  perfect  estimator  of  locality  [30] .  (How¬ 
ever,  like  the  fixed-space  optimal  algorithm,  OPT,  the  one 
studied  by  Prieve  requires  foreknowledge  of  the  reference 
string.  Its  primary  interest  is  in  assessing  how  effective  a 
locality  estimation  procedure  is.) 

Another  measure  of  page  faulting  is  the  lifetime  function 
L(x),  which  gives  the  mean  virtual  .time  between  page  faults 
when  a  reference  string  is  processed  under  a  given  memory 
policy  with  space  constraint  x.  (See  [2] ,  [3] ,  [7] .)  It  is  de¬ 
fined  simply  as 

L(x)=\lf(x)  (3.5) 

where  /(x)  is  a  fault-rate  function.  Fig.  6  shows  a  typical  life¬ 
time  function  for  the  LRU  policy.  There  is  usually  a  value  yo, 
determined  by  a  chord  through  (0,  7(0))  tangent  to  the  curve. 
The  interval  (0,  >-0]  is  the  convex  region  and  (y©.  °°)  is  the 


930 

! 

I  concave  region.  For  some  fixed-space  policies,  the  convex  part 
I  can  be  approximated  by  cx*  with  1.5  <  A:  <2.5  [2],  (3), 

;  (71,  (39);  no  one  has  ventured  approximations  yet  for  the 
[  convex  part  of  Lix)  for  variable-space  policies.  The  convex/ 

!  concave  shape  is  characteristic  of  many  (but  not  all)  lifetime 
{  functions.  For  fixed-space  policies,  y©  is  approximately  the 
I  size  of  the  largest  locality  set,  whereas  for  the  WS  policy  it 
I  tends  to  be  approximately  the  average  (over  virtual  time)  of  the 
i  locality  set  sizes.  (See  Appendix  I.)  The  properties  of  the  life¬ 
time  function  have  been  used  to  demonstrate  efficiency  in¬ 
creases  in  certain  cases,  where  such  increases  could  not  be 
deduced  directly  from  the  properties  of  the  fault-rate  function 
[7], [21], (391. 

In  summary,  a  program’s  page  reference  string  is  an  observ- 
1  able  quantity,  from  which  one  can  compute  fault-rate  func- 
j  tions  for  various  memory  policies,  notably  LRU  and  WS.  The 
abstractions  of  phases  and  localities  can  be  used  to  explain  the 
relative  behaviors  of  programs  under  LRU  and  WS  policies. 
The  lifetime  function,  which  is  the  reciprocal  of  the  fault-rate 
1  function,  characteristically  exhibits  a  convex/concave  shape. 

I  The  existence  of  a  convex  region  in  the  lifetime  function  will 
I  be  used  in  Section  IV  to  deduce  that  certain  variable-space 
i  (non-WS)  policies  can  produce  a  net  reduction  in  paging  rate. 

I  Together  with  the  results  of  Section  II,  this  implies  a  net  im- 
j  provement  in  system  performance. 

IV.  Classification  of  Memory  Policies 

Denote  by  Pj ,  •  •  •  ,/*„  the  set  of  active  programs  during  a 
time  interval  in  which  the  level  of  multiprogramming  is  fixed 
(n  =  n(t)).  Associated  with  P,-  at  time  t  is  its  resident  set  Z,-(r), 
containing  z,(f)>  1  pages.  The  configuration  of  memory  is 
represented  by  a  partition  vector 

Z(0  =  (Z,(r),--- ,Z„(r)).  (4.1) 

The  partition  size  vector  is 

2(0  =  (zi(0.---,z„(0)  (4.2) 

in  which 

zdt)  + •  ’  • +  z„{t)<M  (4.3) 

at  every  time  instant  t,  where  M  is  the  size  of  the  main  memory. 
The  reserve  memory  is  that  portion  unused  by  any  active  pro¬ 
gram  ;  its  size  at  time  t  is 

i 

R(t)  =  M-  Ziit).  (4.4) 

1=1 

As  has  been  noted,  a  memory  policy  can  be  regarded  as  in¬ 
cluding  a  method  of  estimating  program  locality  sets.  The 
estimates  thus  determined  are  used  to  specify  the  content  (and 
size,  if  adjustable)  of  each  program’s  resident  set.  Fig.  7 
suggests  a  classification  of  memory  policies  based  on  the 
method  used  to  estimate  the  locality.  It  will  be  used  as  the 
basis  for  the  ensuing  discussion  of  memory  policies.  Our  objec¬ 
tive  is  to  show  why  performance  improvements  can  be  expected 
under  a  policy  improvement  corresponding  to  a  rightward 
change  along  the  bottom  of  the  diagram  in  Fig.  7. 

When  main  memory  allocation  is  controlled  by  the  program¬ 
mer,  who  inserts  into  the  program  commands  that  move  infor¬ 
mation  in  and  out  of  main  memory,  memory  management  is 
said  to  be  manual.  The  viability  of  this  type  of  management 
is  usually  limited  to  systems  in  which  the  resident  set  size  is 
fixed  and  known  in  advance  by  the  programmer,  who  is  then 


PROCEEDINGS  OF  THE  IEEE,  JUNE  1975 


MEMORY  MANAGEMENT 


MANUAL  AUTOMATIC 


FIXED  PARTITION  VARIABLE  PARTITION 

A 

BALANCED  IMBALANCED  Vl^  VZ  W 
Incrtoilng  ptrformonct 

Fig.  7.  Classification  of  memory  policies. 

in  a  position  to  optimize  information  placement  and  flow  with 
respect  to  that  resident  set  size.  In  contrast,  memory  manage¬ 
ment  handled  by  the  system  is  said  to  be  automatic. 

Manual  memory  management  has  fallen  out  of  favor  for  a 
variety  of  reasons.  One  is  simply  mounting  experience  that 
properly  designed  automatic  management  mechanisms  (e.g., 
virtual  memories)  can  perform  at  least  as  well  for  large  pro¬ 
grams  as  carefully  planned  overlays  [35  ] .  Another  is  the  use  of 
multiprogramming  and  multiplexed  resource  allocation,  which 
rob  the  programmer  of  the  key  assumption  that' a  resident  set 
of  known  size  will  be  continuously  available  to  him.  In  a 
multiprogrammed  environment,  each  active  program  F,-  cannot 
be  guaranteed  good  performance  even  if  it  has  complete  con¬ 
trol  over  its  resident  set  Z,(r),  because  attempted  local  opti¬ 
mizations  need  not  imply  that  the  entire  system  is  optimized. 
The  problem  is  that  the  individual  who  programmed  F,-  does 
not  have  access  to  information  about  Fy(;  ^  i)  and  is  therefore 
not  in  a  position  to  optimize  his  performance  in  relation  to  the 
system’s;  moreover,  there  is  no  guarantee  he  would  use  this 
information  properly  even  if  he  did  have  it.  Therefore,  multi¬ 
programmed  memory  management  is  always  automatic. 

Policies  of  automatic  memory  management  can  be  grouped 
in  two  categories:  fixed  partitioning  and  variable  partitioning. 
The  latter  has  already  been  defined,  in  terms  of  a  time-varying 
partition  vector  Z(r);  techniques  of  varying  the  partition  will 
be  discussed  below.  If  the  resident  set  size  z,(r)  is  a  fixed 
constant  z,-  for  all  t  during  which  F,-  is  active,  then  the  size 
vector  z{t)  is  constant  during  any  interval  in  which  the  set  of 
active  programs  is  fixed;  this  is  known  as  the  fixed-partition 
approach.  In  case  the  entire  address  space  A  (  of  F,-  can  fit  in  the 
allocated  space  of  z,-  pages,  the  resident  set  is  also  fixed: 
Z,(f)  =  A,-.  Otherwise,  if  z,-  is  smaller  than  the  size  of  A,-,  a 
replacement  policy  must  be  used  to  define  what  subset  of  A,- 
constitutes  Z,(r);note  in  this  case  that  Z,(r)  varies  even  though 
z,-  does  not.  In  case  z,-  =  Mfn  for  each  i,  Z{t)  is  called  a  bal¬ 
anced  partition  or  equipartition. 

Imbalanced  partitions  are  capable  of  better  processing  ef¬ 
ficiency,  if  only  because  they  permit  the  flexibility  of  allo¬ 
cating  more  memory  to  programs  with  larger  locality  sets. 
However,  even  if  all  programs  have  identical  locality  prop¬ 
erties,  it  frequently  happens  that  any  imbalanced  partition  is 
more  efficient  than  a  balanced  partition  (see  Appendix  II  and 
(71,(211,(39]). 

In  either  the  fixed-  or  variable-partition  approach,  demand 
paging  is  ordinarily  used  to  acquire  a  program’s  pages  into 
main  memory  while  that  program  is  active.  In  case  latency 


DENNING  AND  GRAHAM:  MULTIPROGRAMMED  MEMORY  MANAGEMENT 


931 


time  at  the  paging  I/O  station  is  a  problem,  some  form  of 
swapping  may  be  used  to  load  a  resident  set  at  the  beginnings 
and  ends  of  a  program’s  active  intervals. 

Arguments  in  support  of  fixed  partitioning  are  of  two  types. 
One  is  founded  on  a  belief  that  memory  availability  in  a  sys¬ 
tem,  and  the  memory  requirements  of  any  given  program,  can 
be  predicted  prior  to  program  processing.  The  other  is  the  ap¬ 
parent  low  overhead  of  implementation,  since  partition  changes 
occur  as  infrequently  as  possible,  viz.,  when  the  set  of  active 
programs  changes.  The  first  argument  is  weak  for  the  same 
reasons  that  arguments  for  manual  overlays  are  weak.  The 
second  argument’s  weakness  is  revealed  when  one  accounts  for 
changing  locality  in  a  program.  Consider  for  a  moment  the  be¬ 
havior  of  a  fixed  partition  z  when  the  set  of  active  programs 
‘  ,Pn  has  a  large  variance  in  locality  set  size  across 
time.  Because  the  partition  is  fixed,  there  is  no  way  to  reallo¬ 
cate  pages  from  Z,-  to  Zj  at  a  time  when  P,’s  locality  is  smaller 
than  Zi  and  Pfs  locality  is  larger  than  Zj,  when  clearly  such  a 
reallocation  would  not  degrade  Pi's,  performance  but  would 
improve  Pj's.  Coffman  and  Ryan  have  analyzed  this  effect,  and 
have  concluded  that  the  variance  in  locality  size  from  one  pro¬ 
gram  phase  to  another  is  ordinarily  large  enough  to  produce  a 
gain  in  memory  utilization  so  significant  that  it  recovers  the 
cost  of  implementing  a  variable  partition  several  times  over 
[9].  Put  another  way,  there  is  a  hidden  overhead  in  the  fixed 
partition— severe  loss  of  storage  utilization  for  programs  with 
wide  variance  of  locality  size-which,  when  accounted  for, 
significantly  diminishes  the  attraction  of  fixed-partition 
strategies. 

Within  the  class  of  variable-partition  strategies,  one  may 
identify  at  least  three  subclasses,  according  to  whether  there  is 
no  correlation,  weak  correlation,  or  high  correlation  with 
locality  changes  of  programs. 

Class  VI:  The  partition  Z(r)  is  varied,  but  with  no  explicit 
correlation  to  the  reference  patterns  of  the  active 
programs. 

Class  V2:  Variation  in  Z(r)  is  explicitly  correlated  with  the 
activities  with  which  active  programs  reference 
pages  in  their  resident  sets,  but  there  is  no  explict 
attempt  to  identify  locality  sets  and  protect  them 
from  preemption. 

Class  W:  The  resident  sets  Z(r)  are  maintained  in  one-to-one 
correspondence  with  working  sets  (estimates  of 
the  locality  sets)  of  active  programs. 

It  should  be  noted  that  Class  W  policies  are  intended  to  be 
precisely  the  “working  set  policies’’  [13],  [14], 

As  noted  in  Section  II,  the  efficiency  of  a  muldprogrammed 
computer  system  depends  on  a  load  control  mechanism  keep¬ 
ing  the  system  away  from  thrashing.  The  objective  is  to  con¬ 
trol  the  level  of  multiprogramming  n{t)  by  activating  or  de¬ 
activating  programs  so  that  most  of  the  time 

n{,t)<kno{t)  (4.5) 

where  no(0  is  the  optimum  level  of  multiprogramming  and 
k'^  \  is  a  small  constant.  Class  W  policies  have  an  inherent 
load  control;  they  will  force  the  deactivation  of  a  program  at  a 
page-fault  time  when  the  memory  reserve  /?(/)  =  0;and  they 
will  defer  the  activation  of  a  new  program  whose  initial  working 
set  is  estimated  at  size  z,  until 

z<Rit)-H,  //>0.  (4.6) 

The  parameter  H  is  adjusted  so  that  the  rate  of  program  de¬ 


activations  caused  by  R{.t)  =  0  at  a  page-fault  time  is  low  [9] , 
[33] .  To  the  extent  that  a  W  policy  is  successful  in  estimating 
localities,  it  will  tend  to  have  n(r)  approximately  n^it)  (see 
Appendix  III).  In  contrast,  VI  and  V2  policies,  which  have  by 
definition  no  direct  way  of  estimating  locality,  must  necessarily 
use  some  cruder  form  of  load  control.  A  typical  control  for 
these  cases  is  a  preestablished  limit  N  on  the  allowable  level  of 
multiprogramming,  sometimes  with  an  adjustment  of  the  limit 
N  inversely  with  the  system  paging  rate.  To  keep  the  thrashing 
probability  low,  it  is  necessary  to  set  N  so  that  the  event 
knQ(t)<.N  is  unlikely— which  implies  that  the  system  spends 
most  of  its  time  operating  at  a  suboptimal  level  of  multipro¬ 
gramming.  Thus  even  if  a  V2  policy  is  successful  in  keeping 
locality  sets  resident,  it  will  tend  to  be  less  efficient  than  a  W 
policy.  Finally,  one  should  expect  VI  policies  to  be  even  less 
efficient  than  V2  policies,  since  they  have  no  mechanism  at  all 
for  tending  to  reallocate  pages  from  resident  sets  that  are  larger 
than  their  contained  locality  sets  to  resident  sets  that  are 
smaller.  Because  they  thus  make  poorer  overall  use  of  storage, 
they  cannot  maintain  as  high  a  level  of  multiprogramming  at  a 
given  level  of  paging  as  a  good  V2  policy  and,  by  (2.1 1),  their 
processing  efficiency  will  be  lower.  The  en.pirical^  evidence 
supporting  this  ranking  of  the  classes  is  discussed  next. 

Class  VI  Policies 

Though  it  may  not  be  obvious  that  varying  a  partition  without 
correlation  to  program  behavior  can  increase  system  processing 
efficiency,  VI  policies  are  capable  of  improving  over  fixed- 
partition  policies.  This  was  first  observed  in  a  study  of  the  so- 
called  biasing  discipline  by  Belady  and  Kuehner  on  the  M44 
system  [2] ,  [3] .  According  to  this  discipline,  a  “favored  state” 
of  execution  is  passed  cyclically  among  the  active  programs.  A 
given  program  remains  in  the  favored  state  until  the  system  has 
experienced  p  page  faults  ip  is  a  parameter).  While  favored,  a 
program  is  granted  new  pages  on  demand  for  its  resident  set  and 
is  exempted  from  replacements;  thus  its  resident  set  size  can  in¬ 
crease  by  as  many  as  p  pages  during  its  favored  interval.  A 
system  throughput  increase  in  the  range  10-15  percent  over  an 
(approximate)  equipartition  using  FIFO  replacement  is  re¬ 
ported.  Belady  and  Kuehner  suggest  (but  do  not  prove)  that 
the  performance  improvement  derives  from  the  convexity  of 
the  lifetime  function.  In  Appendix  II  we  show  that,  given  a 
fixed-partition  size  vector X  =  {X i,  -  ■  ,  X^),  there  exists  a  VI 
policy  under  which  the  fault  rate  of  each  active  program  F,- 
satisfies 

fiiXi)>Ji>fiixi)  (4.7) 

where  J,  is  the  mean  virtual-time  fault  rate  of  F,-,  and  x,-  is  the 
mean  resident  size  in  the  virtual  time  of  F,-.  The  left-hand 
inequality  assumes  that  the  lifetime  function  is  convex  over  the 
range  of  memory  allocations  used;  the  right-hand  inequality 
assumes  that  the  fault-rate  function  is  convex.  One  can  show 
■>f|  >  Xi,  even  if  the  right-hand  inequality  is  false,  directly  from 
the  assumption  that/,-  is  a  decreasing  function.  Thus  the  fixed 
partition  (X| ,  •  •  •  ,  x„),  which  would  be  yet  more  efficient  than 
the  VI  policy,  is  hypothetical— it  cannot  be  implemented,  since 
X,-  >  Xi  implies  x,  -(■•••+  x„  >  M. 

Analyses  by  Sprin  [39  ] ,  Ghanem  [21],  and  Chamberlin  et  al. 
[7]  have  given  further  information  about  partition  policies. 
These  authors  worked  with  lifetime  functions  of  the  type  dis¬ 
cussed  earlier  (see  Fig.  6).  They  discovered  that  processing  ef¬ 
ficiency  may  be  increased  relative  to  an  equipartition,  by 


PROCEEDINGS  OF  THE  IEEE,  JUNE  1975 


9J2 

amounts  comparable  to  those  observed  by  Belady  and  Kuch- 
ncr,  simply  by  using  an  imbalanced  partition.  No  variable 
partition  is  needed.  (See  Appendix  11.)  Spirn  showed  further 
that  the  equipartition  may  be  the  worst  possible.  Moreover, 
Ghanem  showed  that  this  result  may  depend  on  the  lifetime 
function’s  being  “sufficiently  convex”  for  x  <yQ  (cf.  Fig.  6). 
That  is,  for  lifetime  functions  of  the  form  L{x)  =  cx^ix  < >>0 ), 
it  was  necessary  that  k>  a,  where  a  is  a  constant  depending  on 
program  and  system  parameters;  typically  l<a<fc<2.5. 
Ghanem  found  a  stronger  result:  when  the  lifetime  is  insuffi¬ 
ciently  convex,  the  equipartition  is  optimal. 

It  thus  appears  that  two  factors  may  have  combined  to  pro¬ 
duce  the  effect  observed  by  Belady  and  Kuehner.  One  is  that 
their  policy  always  kept  the  system  in  some  imbalanced 
partition.  By  changing  the  partition  only  at  page-fault  times, 
they  not  only  kept  the  overhead  of  their  policy  to  a  minimum, 
but  they  distributed  the  improvements  uniformly  among  the 
active  programs.  The  other  factor  is  the  space  variation  acting 
in  the  virtual  time  of  the  programs,  producing  the  relation  (4.7). 

All  these  analyses  and  observations  lead  inescapably  to  the 
conclusion  that  lifetime  functions  of  programs  are  significantly 
nonlinear,  a  fact  which  has  yet  to  be  reconciled  by  a  linear  as¬ 
sumption  to  which  one  may  be  led  on  Superficial  inspection  of 
a  recent  paper  [34] .  (See  also  [  1 7] .) 

Class  V2  Policies 

An  approach  to  memory,  management  commonly  used  in 
operating  systems  extends  the  idea  of  a  fixed-space  replacement 
policy  to  multiprogramming  simply  by  applying  the  replace¬ 
ment  rule  to  the  entire  contents  of  memory,  without  identify¬ 
ing  which  program  is  using  any  given  page.  For  example,  all 
resident  set  pages  (of  Z,(/)  for  each  i)  can  be  placed,  on  a  single 
(“global”)  LRU  stack;  whenever  an  active  program  runs,  it  will 
presumably  reference  its  locality  set  pages  and  move  them  to 
the  top  of  this  LRU  stack.  A  load  control  is  necessary  (but  not 
sufficient)  for  the  successful  implementation  of  such  a  policy, 
for  if  there  are  too  many  active  programs,  pages  will  be  taken 
from  the  resident  set  of  the  least  recently  run  program  (whose 
pages  will  tend  to  occupy  the  lowest  stack  positions),  where¬ 
upon  that  program  when  run  will  soon  experience  a  page  fault. 
Even  if  the  load  is  properly  controlled,  a  running  program  that 
fortuitously  generates  a  page  fault  before  referencing  much  of 
its  locality  set  will  not  have  moved  many  locality  pages  to  the 
top  of  the  LRU  stack,  whereupon,  when  next  run,  it  may  find 
part  of  its  locality  set  missing  from  memory.  And  this  state  may 
persist,  as  the  program  is  now  unable  to  reference  many  pages 
at  all  between  page  faults.  For  these  reasons,  this  type  of  policy 
has  been  found  highly  susceptible  to  thrashing,  and  it  is  some¬ 
times  precarious  to  expect  such  policies  to  perform  better  than 
fixed-partition  policies  [4] ,  [  18] ,  [32] .  Similar  remarks  apply 
to  a  policy  based  on  a  single  (“global”)  FIFO  list;  it  is  worth 
noting,  however,  that  a  FIFO-based  policy  with  load  control 
was  used  successfully  on  the  M44  system  [  2  ] ,  [  3  ] . 

A  variant  of  the  “global  LRU”  policy  described  is  based  on  a 
usage  bit  u  and  a  changed  bit  c  associated  with  every  resident 
page.  The  bit  u  is  set  to  1  by  the  addressing  hardware  on  any 
reference  to  the  given  page,  and  is  cleared  to  0  by  the  memory 
management  routine.  The  bit  c  is  set  to  1  by  the  addressing 
hardware  on  any  write  reference  to  the  given  page,  and  is  cleared 
when  the  page  is  loaded  or  when  a  copy  is  made  in  the  paging 
I/O  store.  At  intervals,  the  memory  management  routine 
scans  all  resident  set  pages  and  maintains  them  in  four  lists  ac¬ 
cording  to  the  possible  values  of  the  bits  (u,  c).  At  a  page 


fault,  the  first  page  of  the  first  nonempty  list  in  the  ordering 

(u,c)=  [(0,0),(0,  1),(1,0),(1,  1)] 

is  selected  for  replacement.  This  policy,  which  approximates 
LRU  [36],  is  subject  to  the  same  problems  when  used  for 
multiprogramming. 

A  well-known  example  of  a  V2  policy  is  used  in  Multics.  It 
combines  elements  of  the  global  LRU  and  global  FIFO 
policies.  It  is  sometimes  referred  to  as  first  in,  not  used,  first 
out  (FINUFO)  [11],  [12].  All  the  resident  set  pages  are 
linked  in  a  circular  list  with  a  pointer  designating  the  “current 
position,”  and  each  has  a  usage  bit  set  by  the  hardware  when 
the  page  is  referenced.  Whenever  a  page  fault  occurs,  the 
memory  policy  advances  the  current-position  pointer  .around 
the  list,  clearing  set  usage  bits,  and  stopping  at  the  first  page 
whose  usage  bit  is  already  clear:  this  page  is  selected  for  re¬ 
placement.  A  program  whose  locality  set  is  resident  will 
evidently  fare  well  under  this  policy,  since  it  will  be  able  to 
continue  setting  all  its  usage  bits  between  the  times  when  the 
memory  policy  examines  them.  However,  an  active  program 
whose  locality  set  is  not  loaded,  or  which  is  accorded  con¬ 
tinuing  low  priority  for  use  of  the  processor,  will  tend  to  lose 
pages  under  this  policy.  The  success  of  this  policy  will  depend 
on  its  being  carefully  coordinated  with  the  scheduler,  which 
must  control  the  load  and  ensure  that  all  active  programs  have 
an  equal  chance  to  use  the  processor.  There  are  no  perform¬ 
ance  data  comparing  this  against  a  fixed  partition  or  VI 
policy. 

Another  example  of  a  V2  policy  is  the  "/I  C//?r”  procedure 
suggested  by  Belady  and  Taso  [4] .  Associated  with  each  active 
program /*,■  are  variables, and/??’, -,  whose  values  are  updated 
at  each  page  fault  of  /’,-.  The  “activity  count”  /IQ  registers 
that  fraction  of  its  resident  set  referenced  since  /’,-’s  last  page 
fault;  the  “round-trip  frequency”  /?7,-  registers  that  fraction  of 
the  last  K  page  faults  {K  a  parameter)  of  P,-  which  caused 
the  recall  of  the  most  recently  replaced  page.  A  high  value  of 
ACi  indicates  that  P,-  is  making  effective  use  of  its  resident  set. 
A  high  value  of  PP,-  indicates  a  high  frequency  of  mistakes  in 
replacement  decisions.  The  decision  rule  for  replacement,  used 
on  a  page  fault  of  P,-,  is  summarized  as  follows.  If  PP,-  is  low, 
replace  a  page  from  the  resident  set  of  P,-;  otherwise,  replace  a 
page  from  that  Zj{t),ji^i,  for  which  ACj  is  lowest.  Belady 
and  Tsao  discuss  how  to  select  threshold  values  to  tell  when 
AC  and  PP  are  “high”  and  “low,”  and  infer  (but  do  not  test) 
that  this  policy  will  perform  better  than  policies  in  class  VI 
and  the  global  LRU  or  FIFO  policies  in  class  V2.  As  with  other 
V2  policies,  however,  load  control  must  augment  the  ACJRT 
procedure.  Too  high  a  level  of  multiprogramming  can  force  the 
persistent  state  in  which  all  the  /IQ  are  low  and  the  PP,-  are 
high— the  state  of  thrashing. 

We  have  returned  repeatedly  to  the  need  for  V2  (and  VI) 
policies  to  be  augmented  by  a  load  control.  Operational  ex¬ 
perience  with  Multics  and  CP-67  indicates  that  an  effective 
combination  of  a  V2  policy  and  load  control  can  be  designed 
[11],  [32] .  The  same  was  true  of  the  VI  biasing  policy  on  the 
M44  [2] ,  [3] .  With  proper  load  control,  V2  policies  will  tend 
to  be  better  than  VI  policies  because  their  capability  of  reallo¬ 
cating  pages  from  resident  sets  that  are  too  large  for  their 
locality  sets,  to  resident  sets  that  are  too  small,  permits  a  higher 
level  of  multiprogramming  without  an  increase  of  paging. 
Since  heavy-demand  conditions  are  not  at  all  uncommon,  one 
arrives  at  the  conclusion  to  include  the  load  control  and 


DENNING  AND  GRAHAM:  MULTIPROGRAMMED  MEMORY  MANAGEMENT 


933 


locality  estimation  explicitly  in  the  memory  policy,  that  is,  at 
class  W. 

Class  IV  Policies 

As  has  been  noted,  W  policies  have  two  distinguishing  fea¬ 
tures.  First,  the  resident  sets  are  precisely  the  estimates  of  the 
current  locality  sets  of  active  programs.  Moreover,  the  locality 
estimate  of  F,-  is  formed  by  observing  the  behavior  of  Pj 
only-it  is  not  influenced  by  the  activity  of  any  other  program 
Pj.  (Contrast  this  with  the  V2  policies,  in  which  a  resident  set 
Z/(r)  is  a  function  not  only  of  the  activity  of  Pj  itself  but  of 
the  activities  of  other  programs  as  well.)  Second,  load  control 
is  inherent  in  the  definition  of  W  policies,  since  program  activa¬ 
tion  and  deactivation  decisions  must  be  consistent  with  the  re¬ 
quirement  that  locality  set  estimates  of  all  active  programs 
must  be  resident. 

The  definition  of  W  policies  implies  the  existence  of  a  memory 
reserve  of  size  R{t),  that  is,  a  set  of  pages  not  in  any  resident 
set  [see  (4.4)1.  To  improve  memory  utilization,  some  systems 
allocate  the  reserve  R{t)  to  an  n  +  1st  program  Po>  whereupon 
Zo(t)  is  a  subset  of  Pq's  locality  set  and  page  faults  by  any 
Pi  (O^i  ^n)  cause  pages  to  be  preempted  from  Zo(t).  In  case 
Zo(t)  =  R(O=0,  Po  is  considered  to  be  automatically  de¬ 
activated,  and  the  lowest  priority  program  among  Pi ,  ■  ■  ■  ,Pn 
assumes  the  role  of  Pq.  System  thrashing  cannot  occur  in  this 
case:  although  Pq  is  the  only  program  without  a  full  locality 
set  present,  its  page  faults  are  not  permitted  to  preempt  pages 
from  other  resident  sets  and,  accordingly,  the  feedback  among 
paging  rates  necessary  for  thrashing  does  not  exist  (see  [38] 
and  [42]). 

The  most  extensively  studied  example  of  a  class  W  policy 
uses  the  moving-window  working  set  IV, (r,  T),  defined  pre¬ 
viously  as  the  locality  estimator.  Numerous  experiments  have 
shown  the  ease  with  which  one  can  find  a  suitable  value  for  the 
window  size  T  so  that  the  working  set  is  indeed  a  reliable  esti¬ 
mator  of  a  program’s  locality  [8],  [20],  [23],  [31],  [38], 
[40] .  However,  the  estimator  is  not  perfect  [30] . 

Morris  reports  how  the  MANIAC  II  computer  implements  a 
close  approximation  of  the  moving-window  working  set,  by 
associating  hardware  timers  with  each  page  of  main  memory 
and  arranging  to  run  a  given  page’s  timer  only  when  the  pro¬ 
gram  owning  that  page  is  running  on  the  processor;  all  at 
modest  cost  [25].  A  method  of  approximating  a  working  set 
by  examining  usage  bits  at  the  ends  of  time  slices  appeared 
successful  in  preliminary  tests  of  the  RCA  Spectra  70/46  [41  ] . 
A  similar  procedure  was  used  on  the  Grenoble  CP-67,  for 
which  extensive  test  data  show  enormous  improvements  in 
performance  over  the  V2  policy  used  on  the  standard  CP-67 
[32].  Another  similar  procedure  has  been  used  successfully 
on  at  least  one  TSS  system  [  18] .  A  method  using  two  window 
sizes  to  define  three  states  of  a  page  (in,  partly  in,  and  out  of 
the  working  set)  has  been  reported  successful  in  UNIVAC’s 
VMOS  [20].  These  and  other  practical  and  successful  imple¬ 
mentations  show  definitively  that  W  policies  are  neither  diffi¬ 
cult  nor  expensive  to  implement;  they  are  at  worst  marginally 
more  expensive  than  V2  policies  and  give  significantly  better 
performance— if  only  because  they  are  able  to  operate  at  a 
maximal  level  of  multiprogramming  without  thrashing. 

An  interesting  variant  to  the  fixed-window-size  working  set 
defined  in  the  foregoing  has  been  studied  by  Chu  and  Opder- 
beck  using  extensive  simulations  [8],  [27].  Their  procedure, 
known  as  page-fault  frequency  (PFF),  recomputes  a  program’s 
working  set  at  each  page  fault  time  t  of  that  program,  using  the 


pagti/timt 


Fig.  8.  Comparison  of  W  policy  and  fixed  partition. 

time  interval  since  the  prior  page  fault  of  that  program  (time 
t')  as  a  window  The  computation  requires  merely  the  examina¬ 
tion  of  usage  bits.  Unfortunately,  should  the  current  window 
t  -  t'  fortuitously  be  small,  few  usage  bits  will  have  been  set; 
since  this  will  cause  the  next  page-fault  interval  to  be  short,  the 
state  of  the  working  set  underestimating  locality  will  persist. 
Protection  against  this  is  easily  achieved.  If  the  interval  f  -  f’  is 
snialler  than  a  given  threshold  Tq,  the  incoming  page  is  added 
to  the  resident  set  but  no  replacement  is  made  (though  the  usage 
bits  are  cleared).  The  acronym  PFF  arises  since  I/Tq  has  the 
interpretation  of  the  maximum  allowable  mean  rate  (fre¬ 
quency)  of  page  faults.  The  resident  set  defined  by  PFF  for 
program  P,-  at  a  page-fault  time  r,  to  be  in  effect  until  P,’s  next 
page  fault,  is 

lV,(r,r-r'),  t-t'>To 

Zi(t')  +  r(r),  otherwise 

where  t'  is  the  time  of  the  prior  page  fault  and  r{t)  is  the 
(missing)  page  referenced  at  time  t.  Besides  the  usage  bits,  the 
full  implementation  evidently  requires  only  a  timer  register  in 
the  processor  to  compute  t  -  t  .  Chu  and  Opderbeck’s  studies 
indicate  that  Tq  can  easily  be  chosen  so  that  PFF  is  indistin¬ 
guishable  from  a  fixed-window  working  set  [8] ,  and  that  PFF 
used  as  a  W  policy  is  significantly  better  than  certain  LRU-type 
policies  from  classes  VI  and  V2  [27] . 

Fig.  8  suggests  why  a  W  policy  will  be  better  than  a  fixed- 
partition  policy,  as  long  as  programs  are  run  in  a  region  of  the 
fault-rate  curve  in  which  WS  is  superior  (cf.  Fig.  5).  Let  z,-  de¬ 
note  the  resident  set  size  for  program  P,-  under  a  fixed  partition 
using  LRU  separately  for  each  resident  set.  As  long  as  z,-  >  Xq,-, 
there  will  exist  a  point  w,-  <  z,-  corresponding  to  a  mean  working 
set  size  under  which  the  program  would  achieve  the  same  fault 
rate  as  under  the  LRU  policy.  Setting  IP  =  Wj  +  •  •  •  +  this 
means  that  the  average  level  of  multiprogramming  could  be  in¬ 
creased  approximately  by  the  ratio  MjW  without  increasing  the 
system  fault  rate  over  the  original  fixed-partition  policy,  which 
in  turn  implies  an  increase  in  processing  efficiency  [cf.  (2.1 1)] . 

Fig.  9  suggests  the  application  of  this  principle  to  conclude 
that  a  W  policy  will  be  superior  to  a  VI  policy.  The  three 
points  on  the  vertical  line  through  x,-  depict  the  relation  (4.7) 
given  earlier  for  VI  policies.  As  long  as  A',  >Xo,-,  there  will 
exist  a  point  at  which  /(WS,  w,)  = Setting  W  = 

Wi  +  •  •  •  +  w„,  the  average  level  of  multiprogramming  could  be 
approximately  the  ratio  M/W  and  yield,  as  before,  higher  ef¬ 
ficiency.  (The  W  policy  produces  less  of  an  improvement  over 
the  VI  policy  than  over  the  fixed  partition.  Define  w'l  so  that 
/(WS,  wj)  =  /(LRU,  Xi);  note  that  wj  <  w/  and  that  W'  = 
vv'i  +  •  •  •  +  w'„  <  W.  Therefore,  the  ratio  Af/IV'  is  larger  than 
M/W.) 


934 


PROCEEDINGS  OF  THE  IEEE,  JUNE  1975 


pOfM/llm* 


Fig.  9.  Comparison  of  W  and  V 1  policies. 


The  foregoing  discussion  shows  that  working  set  policies  in¬ 
crease  processing  efficiency  over  other  policies.  However,  they 
have  been  shown  to  improve  other  measures  as  well.  Chu  and 
Opderbeck,  for  example,  show  that  the  “space-time  cost” 
(integral  of  resident  set  size  over  time)  satisfies 

5r(WS,  T)  <  5r(LRU,  x)  (4.9) 

for  all  X,  and  all  T  in  a  very  wide  range  [8] .  In  fact,  the  mini¬ 
mum  difference  between  the  two  sides  of  this  inequality 
ranged  10-30  percent,  the  greater  differences  being  directly 
correlated  to  a  large  coefficient  of  variation  (ratio  of  standard 
deviation  to  mean),  in  locality  set  size.  The  function 
^^(LRU,  a:)  had  a.  sharp  minimum,  while  ST(y/S,T)  had  a 
very  wide  and  flat  minimal  region;  therefore,  forx  injudiciously 
chosen,  the  space-time  cost  difference  may  far  exceed  the 
30  percent  figure  just  quoted.  Coffman  and  Ryan  studied  two 
measures  of  storage  utilization— overflow  probability  and  mean 
amount  by  which  demand  exceeds  resident  set  size,  comparing 
a  working  set  partition  against  a  fixed  equipartition  .[9] .  With 
respect  to  these  measures,  the  working  set  partition  was  always 
at  least  slightly  better,  and  significant  differences  would  exist 
for  larger  coefficients  of  variation  in  locality  size. 

The  W  policies  appear  superior  by  many  measures;  their 
superiority  is  associated  with  changing  locality  size  in  pro¬ 
grams,  and  the  degree  of  superiority  increases  as  the  coefficient 
of  variation  in  locality  size  increases. 

V.  Conclusions 

The  first  part  of  this  paper  explained  a  network  representa¬ 
tion  of  a  typical  multiprogrammed  computer  system,  and  used 
it  to  establish  properties  used  later  in  the  paper,  a)  Increasing 
the  load  (that  is,  level  of  multiprogramming)  without  changing 
system  or  program  parameters  increases  processing  efficiency, 
b)  Decreasing  the  paging  rate  for  fixed  load  increases  processing 
efficiency,  c)  Paging  rates  will  generally  increase  with  increas¬ 
ing  load  because  of  the  fixed  total-memory  constraint.  This  im¬ 
plies  an  optimum  load,  above  which  efficiency  drops  rapidly 
(thrashing).  Load  control  is  necessary,  d)  Processing  efficiency 
is  a  suitable  measure  of  system  performance,  since  throughput 
is  directly  proportional  to  it  and  response  time  inversely  pro¬ 
portional  to  it. 

The  second  part  of  the  paper  explained  basic  properties  and 
measures  of  program  behavior.  The  principal  observations  are 
as  follows,  a)  The  fault-rate  function  of  LRU  is  frequently  ob¬ 
served  to  be  convex,  while  the  lifetime  function  frequently  has 
a  convex/concave  shape,  b)  The  fault-rate  function  of  working 
set  (WS)  is  frequently  observed  to  be  significantly  below  that  of 
LRU,  a  direct  proof  of  locality  size  variation  during  program 
execution. 


The  third  part  of  the  paper  explained  a  classification  of 
multiprogrammed  memory  management  policies,  then  used  the 
results  of  the  previous  sections,  together  with  information  from 
the  literature,  to  establish  a  ranking  among  five  classes  of 
policies,  from  worst  to  best: 

1)  fixed  partition,  balanced; 

2)  fixed  partition,  imbalanced; 

3)  variable  partition,  no  correlation  with  program  behavior 
(VI); 

4)  variable  partition,  some  correlation  with  program  behavior 
(V2);  and 

5)  variable  partition,  direct  estimation  of  locality  (W). 

(This  ranking  should  be  interpreted  to  mean  that,  given  a  policy 
at  rank  /,  there  exists  a  better  one  at  rank  z  +  1.)  The  principal 
conclusions  are  the  following,  a)  Imbalanced  partitions  ar-e  bet¬ 
ter  than  balanced  partitions,  partly  because  they  recognize  in¬ 
herently  different  memory  requirements  of  programs,  and  also 
because  of  the  convex  property  of  the  lifetime  function.  In 
many  cases,  an  equipartition  is  the  worst  possible,  even  among 
programs  with  identical  memory  demand  characteristics, 
b)  Even  though  they  do  not  correlate  memory  reallocations 
with  program  behavior,  VI  policies  may  nonetheless  improve 
over  fixed-partition  policies.  Two  factors  operate:  the  avoid¬ 
ance  of  the  equipartition  and  the  effect  of  increasing  average 
processor  demand  over  the  virtual  time  of  programs;  both 
factors  are  attributable  to  the  convexity  of  the  lifetime  func¬ 
tion.  c)  V2  policies  do  better  than  VI  policies  because  they 
obtain  better  space  utilization  by  reallocating  from  resident 
sets  that  are  larger  than  contained  localities  to  resident  sets  that 
are  smaller,  and  because,  with  proper  load  control,  they  tend 
to  keep  each  program’s  locality  set  present,  d)  W  policies  do 
better  than  V2  policies  because  they  estimate  locality  directly, 
the  estimates  are  independent  of  load  and  other  programs’  de¬ 
mands  for  memory,  and  they  have  inherent  load  control. 
Numerous  studies  show  W  policies  best  according  to  a  variety 
of  measures.  Their  implementation  cost  is  not  significantly 
more  than  for  VI  or  V2,  and  the  gain  in  performance  amply 
rewards  the  investment  in  them. 

Working  set  (W)  policies  establish  a  limit  on  the  load  n(r)  at 
each  time  t.  To  the  extent  that  these  policies  succeed  in  esti¬ 
mating  locality  sets,  n(f)  will  approximate  the  optimal  load 
no(t).  Experience  shows  that  these  policies  keep  the  prob¬ 
ability  of  thrashing  (that  is,  the  probability  that  n{t)>  krioit) 
for  some  small  constant  k'>  \  )  acceptably  small.  In  contrast, 
VI  and  V2  policies  have  no  direct  method  of  estimating  a 
proper  load  level.  Typically  they  establish  a  prior  limit  N  on 
the  load  (sometimes  with  adjustments  in  N  inversely  with  the 
system  paging  rate).  Since  N  must  be  chosen  so  that  the  thrash¬ 
ing  probability  (the  probability  that  kno(t)  <N)  is  low,  the 
system  runs  much  of  the  time  at  suboptimal  efficiency.  In 
other  words,  the  more  precise  load  control  of  working  set 
policies  is,  of  itself,  a  significant  reason  for  their  success. 

All  the  arguments,  and  all  the  experiments,  used  to  demon¬ 
strate  the  superiority  of  working  set  policies  rely  directly  on, 
or  are  correlated  directly  with,  significant  variations  in  program 
locality  size  over  virtual  time.  Though  there  has  been  con¬ 
siderable  work  on  modeling  program  behavior  (e.g.,  [  10] ,  [22] , 
[38],  [40]),  none  of  it  has  so  far  produced  a  working  model 
in  which  locality  set  size  variation  is  accounted  for.  Many 
experiments  show  that  many  programs  exhibit  a  marked 
propensity  for  two  or  more  particular  working  set  sizes  [22], 
[23],  [31],  and  that  working  set  fault  rates  are  significantly 


DENNING  AND  GRAHAM:  MULTIPROGRAMMED  MEMORY  MANAGEMENT 


935 


less  than  LRU  fault  rates  over  a  wide  range  ofmemory  con¬ 
straints;  these  observations  cannot  be  accounted  for  under  the 
assumption  of  fixed  locality  size.  The  next  iteration  in  the 
process  of  program  behavior  modeling  must  be  the  development 
of  techniques  for  representing  locality  set  size  variation. 

The  viability  of  working  set  policies  and  locality-based  pro¬ 
gram  models  appears  assured. 


Appendix  I 

Computation  of  Fault-Rate  Functions 

Outlined  here  are  computationally  efficient  methods  for 
finding  LRU  and  WS  fault-rate  functions,  for  a  given  reference 
string  R  =  r(l)  •  •  •  r(k)  •  •  •  r(K).  The  techniques  are  treated 
fully  in  [10],  [15],  [24],  and  (37). 


The  LRU  Algorithm 


The  LRU  stack  at  virtual  time  A:  is  a  vector  s(k)  =  (Ji ,  *  •  • , 
^q(k))  of  distinct  pages,  in  which  1  <  i  < ;  <  qik)  implies  that 
page  Sj  was  more  recently  referenced  than  sj,  and  q(k)  is  the 
number  of  distinct  pages  referenced  through  time  k.  The 
initial  stack  s(0)  is  empty.  The  stack  distance  d(k)  of  the 
reference  r(k)  is  i  if  r(k)  is  at  the  ith  p'bsition  in  stack  s(k  -  1), 
and  is  if  r{k)  is  not  in  stack  s(k  -  1).  If  r(k)  =  y,  the  new 
stack  sik)  is  related  to  the  former  by 


5(*)=< 


1  >  "  ’ '  >  •^1-1  >  •^1  +  1  j  *  ■  ■ ,  ^q(k-l  ))> 

if  dik)  =  i<qik-  1) 


if  d(,k)  =  o°. 


Note  in  the  second  case  q(k)  =  q(k  -  1 )  +  1 . 

Since  the  LRU  algorithm  always  replaces  the  least  recently 
used  page,  it  follows  that  the  pages  resident  in  an  x-page 
memory  managed  by  LRU  at  time  k  are  precisely  the  first  x 
entries  in  the  stack  s{k)  and,  moreover,  that  a  page  fault  occurs 
at  time  k  if  and  only  if  d(k)>x.  Therefore,  the  fault-rate 
function  /(LRU,x)  is  the  fractional  number  of  distances  that 
exceed  x.  To  calculate  /(LRU,  x),  one  must  process  the  refer¬ 
ences  r(l)r(2)  •  •  •  ,  computing  the  stacks  5(0)s(l)s(2)  •  •  •  , 
and  recording  the  occurrences  of  stack  distances  ci(l)d(2)  ■  •  • 
in  the  counters  c[  1  :A^]  and  c[°o].  (The  number  of  program 
pages  is  N.)  When  this  is  done,  c[/]  counts  the  number  of 
virtual  times  k  at  which  d(k)  =  i.  Once  the  stack  distance 
counts  have  been  determined,  the  number  of  page  faults  for  an 
x-page  memory  is  c[x  +  1  ]  -F  •  •  •  -F  c[yV]  +  c[<»] ;  therefore,  the 
LRU  fault  rate  is  computed  from  the  recursion  formula 


cfoo] 

/(LRU,  N)  = 

K 


/(LRU,x  -  1)  = 


c[x] 


-F/(LRU,x),  x=A^,A^-l, 


To  obtain  the  counts,  the  following  procedure  is  used: 


c[  1  :  A,oo]  :=  0;itacA'|  1  :  A]  :=  0; 
for  A:  :=  1  to  K  do 
y  :=r(A'); 

i :  =  1 ;  candidate  :  =  rracfcf  1  ]  ; 
while  candidate  y  and  candidate  ^  0 
do  exchange  {candidate,  stackli  +  1  ]); 
i:=i  +  1 ; 
end 


[initialize] 

[next  reference] 

[search  and  up¬ 
date  stack  for 
entry  y] 


if  candidate  =  0 
then  c[°o]  :=  c[oo]  +  1 
else  c[/]  :=  c[/J  +  1 ; 
stack[  \  ]  :  =  y  ; 

end 


[update  proper 
counter] 

(put  referenced 
page  atop  stack] 


The  Algorithm 

Associate  with  R  =  r(  1 )  •  •  •  r(A:)  •  •  •  r{K)  a  sequence  of  back¬ 
ward  distances  B  =  b{\)  '  '  •  b(k)  •  •  •  b{K),  in  which  b{k)  =  i 
implies  r{k  -  i)  =  r{k)  and  r{k')  r(A:)  for  k  -  i  <k'  <  A:;  take 

b{k)  =  oo  if  r{k)  is  the  first  reference  to  a  page.  In  other  words, 
b{k)  is  the  interval  since  the  prior  reference  to  page  r{k).  (For 
example,  if  R  =  1  23231,  R  =  o°°°o°225.)  The  next  reference 
r{k  -F  1)  is  missi  >g  from  the  working  set  H'(A:,  T)  if  and  only  if 
&(Ac+l)>7’.  Define  the  counters  c[l:R]  and  c[°°]  to 
record  the  occurrences  of  backward  distances;  thus  c[/]  counts 
the  number  of  distinct  virtual  times  Ac  at  which  b{k)  =  i.  Anal¬ 
ogous  to  LRU,  the  missing  page  rate  for  pure  working  set 
me  nory  allocation  is  defined  by  the  recursion  formula 


m{K)  = 


m{T-  1)  = 


£M 

K. 

cm 

K 


+  m{T), 


T=K,K-  1,  • 


1. 


To  obtain  the  counts,  this  algorithm  can  be  used: 

c(  1 :  AT,  «>]  :=  0;  time[  1 :  A^]  :=  0; 
for  Ac  :=  1  to  R  do 
y  :=  KA:); 

if  time[y]  -  0 
then  c]®®]  :=  c['»]-F  1 
else  i:=  k  -  time  [y] 
c[/]  :=  c[f]  +  1; 
time  [>>]  :=  Ac; 
end 


The  working  set  size  at  time  Ac  is  denoted  by  w(Ac,  T)  and 
the  mean  working  set  size  by 


^  fc=i 


Define  A(A:,  T)  to  be  1  if  r(A:)  is  missing  from  H'(Ac  -1,7’)  and 
0  otherwise.  Then  note  w(Ac,  T  +  1 )  =  w(A:  -  1,7’)  +  A(Ac,  7’); 
substituting  into  the  definition  of  s{T), 


=  +  A(^.7’). 

^  k=i  K  K 


Recognizing  the  last  term  as  a  definition  of  the  missing-page 
rate  m{T),  we  find  the  recursion  formula  for  calculating  mean 
working  set  size: 

s(0)  =  0 

s{T+  \)=s{T)^■m{T)-  T=0,l,--,K-  1. 

K, 


936 


PROCEEDINGS  OF  THE  IEEE,  JUNE  1975 


Finally,  the  fault-rate  function  is  denoted  by  /(WS,  a:)  and  is 
given,  parametrically  by 

/(WS, 5(7))  =  m(n  7=0. 


Examples 

Consider  the  three  reference  strings  over  a  10-page  program. 
/?i  =01  •  • -9(9  •  •  •  1001  •  •  •9)‘° 

/?2  =01(01)2°23---9(23  •  •  •  9)’“ 

/?3  =  01(1001)’°23  • .  •  9(9  •  •  •  3223  •  •  •  9)‘®. 


All  have  length  A  =  210.  /?i  represents  a  program  using  a 

single  10-page  locality;  since  Ri  has  the  property  that,  at  any 
time  the  page  with  the  largest  stack  distance  is  also  the  one 
with  the  maximum  forward  distance,  LRU  is  optimal  for/?i 
[24].  In  contrast,  /?2  and  7?  3  represent  programs  which  have 
two  disjoint  localities  {  0,  1}  and  {2,  3,  ••*,9}.  InR3,LRU 
is  optimal  just  as  in  /?i.  Fig.  10  shows  the  fault-rate  curves 
for  LRU  and  WS  for  these  strings. 

It  is  observed  that  LRU  is  always  better  for  R 1 ,  WS  is  at  least 
as  good  for  R2 .  and  WS  is  better  for  R3  provided  x  >  6.6.  The 
superiority  of  WS  over  LRU  for  (certain  ranges  of  x  in)  Ri 
j  and  R3  directly  results  from  these  two  strings’  exhibiting  two 
I  distinct  phases  over  different  size  localities.  For  suitable 
I  choices  of  the  window  size  7,  the  working  set  measures  the 

I  locality  set  exactly  (as  long  as  the  window  is  contained  within 

I  a  phase),  so  that  the  only  paging  occurs  during  locality 
I  transitions.  However,  the  average  working  set  size  is  less  than 


that  of  the  larger  locality;  LRU  operating  at  that  same  memory 
size  produces  page  faults  continuously  in  that  phase  because 
that  locality  will  not  fit  into  the  available  space.  It  is  especially 
important  to  note  that,  because  of  its  ability  to  adapt  its  space 
requirement  to  varying  program  locality,  WS  is  capable  of 
improving  over  an  optimal  fixed-space  algorithm  (such  as  LRU 
applied  to  R3).  Similar  observations  have  been  made  in 
practice  [28] -[30] . 

Fig.  1 1  shows  the  lifetime  functions  for  the  three  strings. 
Each  exhibits  the  characteristic  convex/concave  shape.  The 
concave  region  for  the  LRU  lifetime  function  begins  at  the 
maximum  locality  size.  The  concave  region  for  the  WS  life¬ 
time  function  begins  at  approximately  the  virtual-time  average 
locality  size.  (For  strings  R2  and  7? 3,  the  average  locality  size 
is  computed  as  (2  •  42  +  8  •  168)/210  =  6.8.) 

Appendix  II 

Analysis  of  VI  Policies  (See  Also  [16]) 

General  Properties 

Let  be  a  sequence  containing  at  least  two  distinct  values 
such  that  the  function  h{x)  is  convex  for  min 
max  3nd  let  be  a  set  of  positive  weights  that  sum 

to  1.  A  well-known  property  of  convex  functions  is 

^  a^x,^.  (1) 

Our  objective  is  proving  relation  (4.7)  of  the  text  which 
states  that,  given  a  partition  size  vector  X  =  •  •  • ,  X„),  one 


DENNING  AND  GRAHAM;  MULTIPROGRAMMED  MEMORY  MANAGEMENT 


937 


may  construct  a  VI  policy  under  which,  for  each  program  P/, 

fi{Xi)>7i>fi(xi)  (2) 

where  //  is  the  fault-rate  function,  fj  is  the  mean  virtual-time 
fault  rate'  under  the  VI  policy,  and  Xj  is  the  mean  virtual-time 
memory  allocation  under  the  VI  policy.  The  left-hand  in¬ 
equality  requires  the  convexity  of  the  lifetime  function,  the 
right-hand  one  the  convexity  of  the  fault-rate  function. 

Let  to=0  and  '  '  tr  denote  a  sequence  of  successive 
page-fault  time  on  a  system’s  processor.  Let  r,-  denote  the 
number  of  faults  generated  by  program  P,-  in  the  observation 
interval  (0,  ,  and  note  that 

>■ = Z 

<=i 

Let  xnf  denote  the  memory  allocation  of  program  P,-  in  the 
processing  interval  just  preceding  its  A:th-page  fault  (1  <  fc  ^  r,). 
Each  is  assumed  to  lie  in  the  convex  region  of  L,-.  The 
mean  virtual-time  interval  from  the  k-  1st  to  the  kth  page 
fault  in  P,-  is  taken  to  be  the  lifetime  LiXxfj^).  (This  is,  in  fact, 
an  approximation.  As  will  be  discussed  shortly,  however,  it 
does  not  affect  the  conclusions.)  Under  these  assumptions, 
the  mean  lifetime  in  P,-  over  the  observation  interval  is 


where  S  is  the  mean  service  time  at  the  paging  I/O  station; 
together  with  (2.14)  of  the  text,  this  implies  that  a  VI  policy 
satisfying  (5)  must  increase  processor  utilization  over  the  fixed 
partition  X.  _  _ 

As  noted  in  the  text  after  (2.9),  the  use  of  L  '  and  L  in  (9) 
is  an  approximation.  The  processor  utilization  is  in  reality  a 
function  of  all  the  lifetime  intervals,  not  just  their  mean. 
Ghanem  [211  and  S^irn  [39]  have  shown  that,  when  L,-  are 
sufficiently  convex,  T'>T  will  imply  the  increase  in  utiliza¬ 
tion  as  argued  here.  Spim  showed  that  observed  lifetime 
functions  do  usually  have  the  required  convexity  ;  hence  our 
simple  argument  is  sufficient  to  justify  our  conclusions.  _ 
To  establish  the  right-hand  inequality  of  (2),  define  P,-  = 
as  the  mean  lifetime  interval  in  P,-,  and  note  that 


Ti 


Xik- 


(10) 


Using  the  definition  of  fault  rate  as  reciprocal  of  lifetime,  and 
assuming  the  fault-rate  function  is  convex,  we  have  the  in¬ 
equalities  as  desired: 


z 


fc=l 


^iX^ik)  \ 
— 


=  fiixi). 


Li  =  -j^Li{xi^)  (4) 

'’«■  k=l 

and  the  mean  fault  rate  over  this  interval  is  /,•  =  1/Z-/.  Define 
the  quantity 

1 

Xi^-t^Xiu  (5) 

k=l 

which  is  the  mean  memory  allocation  measured  at  page-fault 
times.  We  shall  show  shortly  how  the  scheduler  can  choose 
the  allocations  so  that  X,-  is  the  same  as  the  resident  set 
size  of  P,-  according  to  the  given  fixed  partition  X. 

Under  the  given  fixed  partition  X,  the  mean  lifetime  interval 
of  program  P,-  is  LfiX/),  so  that  the  mean  system  lifetime 
interval  is  the  total  processing  time  consumed  divided  by  the 
total  number  of  page  faults: 

^r,(z,).  (6) 

/=! 

Under  a  VI  policy  satisfying  (5),  the  mean  system  lifetime 
interval  is 

t=i 

since  r,Z-,-  is,  from  (4),  the  total  time  consumed  by  P;.  Apply¬ 
ing  (1)  to  (4), 

Zt>LiXXi).  (8) 

Since  lifetime  is  the  reciprocal  of  fault  rate,  this  establishes  the 
left-hand  inequality  of  (2).  Applying  (8)  to  (6)  and  (7),  we 
have  L'>  L,  which  implies  that  the  relative  utilization  of  the 
paging  I/O  station  satisfies 

Rp  =  Sir<SIL  =  Rp  (9) 


(11) 

Since  /)•  is  decreasing,  relation  (11)  implies  3c,-  >  A',-.  However, 
Xi  >  Xi  can  be  shown  directly,  even  if  is  not  convex.  Observe 
that  there  exists  u  such  that  Z.,  (x)  >  L,  (u)  if  and  only  if  x  ^  u, 
and  consider 


(12) 

It  was  noted  prior  to  (4)  that  the  use  of  /./(x,-^)  is  an  approxi¬ 
mation.  The  reason  is  that  the  virtual-time  interval  between 
the  k-  1st  and  fcth  page  faults  may  be  interrupted  by  p  >  0 
file  I/O  requests,  so  that  P,-  in  fact  experiences  during  this 
interval  a  resident  set  size  sequence  yoPi  '’’Pp.  in  which 
Po  ^Pi  ^  ■  ‘^Pp  and  x,-fc  =Pp.  However,  this  implies  that 
Liixiif)  underestimates  the  true  lifetime  in  this  interval;  there¬ 
fore,  Li  underestimates  the  true  mean  lifetime,  and  relations 
(8)  and  (9)  remain  valid.  Moreover,  3c,-  underestimates  the  true 
virtual-time  resident  set  size,  and  relation  (12)  remains  valid. 
Finally,  (11)  remains  valid,  for  we  can  interpret  /,-  as  the  true 
value,  observe  that  the  second  equality  in  (1 1)  is  an  identity, 
and  then  recall  that  x,-  is  an  underestimate  and  /,-  is  decreasing. 
The  errors  introduced  by  this  approximation  are  not  likely  to 
be  large,  especially  in  systems  with  1,  for  the  mean  file 
I/O  service  time  is  usually  10  times  the  mean  paging  I/O  service 
time,  and  Rf^\  implies  that  p  =  0  at  least  90  percent  of  the 
time. 

Implementation 

Equations  (6)  and  (7)  allow  for  the  possibility  of  an  arbitrary 
scheduling  discipline  over  the  observation  interval— the  ratios 
r//r  reflect  the  relative  priorities  given  to  the  programs.  For 
FIFO  scheduling,  each  of  these  ratios  will  tend  to  be  \/n. 

A  VI  policy  satisfying  (5)  for  a  given  partition  X  may  be 
approximated  arbitrarily  closely  using  an  adaptive  procedure. 


938 


PROCICKDINGSOF  THE  IEEE,  JUNE  1975 


Let  Di  denote  the  relative  deviation  of  the  mean  resident  size 
of  Pf  from  the  desired  A'/: 


Xjk  ~ 


Xi 


(13) 


The  estimator  Z)/  can  be  updated  on  each  page  fault  of  Z*/  by 
the  statements 


Z)/:=(Z)/r,  +  (z,-Ar/)/Ar.)/(ri+l) 

rt  :=  r/+  1  (14) 

where  Z/  is  the  resident  set  size  at  the  page  fault.  The  memory 
allocation  decision  rule  can  be  implemented  as  a  two-phase 
repeating  procedure.  During  the  “converge  phase,”  a  page 
fault  in  Pi  will  result  in  a  page  being  removed  from  Pj,  where 
/  =  i  if  Di  >  0,  and  /  is  the  index  of  the  program  with  largest 
positive  deviation  if  D,-  <  0.  The  effect  of  a  memory  realloca¬ 
tion  during  this  phase  will  be  to  reduce  the  total  relative 
deviation  of  the  memory  partition  from  the  desired  X.  During 
the  “diverge”  phase,  a  page  fault  in  Pi  will  result  in  a  page 
being  removed  from  Pj,  where  /  is  the  index  of  the  program 
with  smallest  absolute  deviation.  The  effect  of  a  main 
memory  reallocation  in  this  phase  will  be  an  increase  in  the 
total  relative  deviation  of  the  memory  partition  from  the 
desired  X.  At  the  end  of  a  pair  of  diverge/converge  phases,  a 
partition  sequence  that  conforms  to  (5)  will  have  been  gener¬ 
ated,  whereupon  the  VI  policy  has  generated  higher  processor 
utilization  than  the  fixed  partition  X. 

If  AT  is  an  equipartition,  any  symmetric  memory  reallocation 
procedure  with  FIFO  scheduling-such  as  the  cyclically 
permuted  favored  state  under  the  “biasing”  policy  [2],  [3]  — 
is  sufficient  to  produce  a  V 1  policy  improving  over  X. 


Fixed  Imbalanced  Partitions 

It  is  possible  for  a  fixed  imbalanced  partition  to  improve 
over  an  equipartition.  Let  X  be  given  partition  in  which  at 
least  two  resident  sets  have  different  size.  Suppose  that  FIFO 
scheduling  is  used  at  the  processor  and  paging  I/O  stations. 
Under  these  assumptions,  r,7r=  1/n  and  =  Af,-  for  each  i. 
The  mean  lifetime  for  X  will  be  larger  than  that  of  the  equi¬ 
partition  if 


n  n  /jif\ 

(15) 

which  is  certainly  true  if  there  exists  a  convex  function  L 
passing  through  the  points  L/fAr,)  and  Li{Mln);  in  fact,  if 
Li  =  L  for  all  i,  every  imbalanced  partition  is  better  than  the 
equipartition.  (See  also  [21],  [39],  pertaining  to  networks 
with  different  queueing  disciplines.) 


Example 

Consider  two  active  programs  with  the  same  fault  rate  and 
lifetime  functions; 


X 

fix) 

Lix) 

10 

100/5 

5/100 

20 

4/5 

5/4 

30 

1/5 

5 

where  S  is  the  mean  paging  I/O  station  service  time.  Suppose 
that  the  file  I/O  station  is  unused  (/?/=  0).  Consider  a  (non¬ 


demand  paging)  variable-partition  policy  that  allocates  memory 
according  to  the  partition  sequence  (10,30)00,10)  for  equal 
numbers  of  page  faults  in  each  partition;  and  a  fixed  partition 
(20,20).  Using  the  formulas  given  earlier  with  5=10; 


Measure 

Partitions 

(10,30) (30,10) 

(20,20) 

L 

5.05 

2.50 

f 

0.20 

0.40 

X 

29.8 

20.0 

X 

20.0 

20.0 

Uo 

0.43 

0.24 

The  utilization  Uq  was  computed  according  to  Buzen’s  method 
[6]  and  verified  by  simulation.  A  linear  interpolation  between 
/(20)  and  /(30)  gives /(3c)  =  0.1 1  and  verifies  relation  (2). 

By  the  symmetry  of  the  example,  the  imbalanced  fixed 
partition  (10,30)  will  produce  Uq  =  0.43,  while  the  balanced 
partition  (20,20)  will  produce  (/q  =  0.24,  verifying  that 
balanced  partitions  may  be  less  efficient  than  imbalanced  ones. 

See  [16]  for  another  view  of  this  analysis. 

Appendix  III 

Near-Optimal  Partitions 

Fig.  12  suggests  why  a  working  set  policy  is  capable  of 
generating  a  near-optimal  partition.  Consider,  a  set  of  active 
programs  having  the  same  lifetime  function  L  under  a  working 
set  policy,  and  suppose  that  L  does  not  increase  much  for 
X  larger  than  the  point  yo-  (Specifically,  assume  that  the 
slopes  satisfy  L  {xq)<L'{wq),  for  wg  to  be  defined  below.) 
For  a  partition  x,  the  mean  lifetime  (assuming  FIFO  queueing 
in  the  network)  is 


Ux)  =  ~Y.Uxi). 

Our  objective  is  to  find  a  partition  that  maximizes  L{x). 

Consider  a  working  set  partition  with  average  resident  sizes 
w  =  (wg,  w, ,  •  •  •  ,  w„)  in  which  w,  =  yg  for  l</<n  and 
Wq  =  M  -  (wj  +  •  •  •  -f  w„)  < yg,  that  is,  one  in  which  the  re¬ 
serve  memory  Wg  is  allocated  to  an  n  +  1st  program.  Consider 
any  other  partition  v  =  {Vo,Vi,  -  ■  •  ,v„),  such  as  might  be 
generated  under  another  variable-partition  policy.  If  any 
t'l  ^>’0.  V  cannot  maximize  L{x)  since  decreasing  U/  toyg  and 
reallocating  the  pages  u,-  -  yg  to  the  program  with  smallest  Oj 
will  increase  L(u).  Assuming  all  Vi<yo,  then  all  u,  >wg,  or 
else  Ug  +  •  •  •  +  =  A/  is  impossible.  Since  L(,w)  lies  on  the 

chord  connecting  the  points  wg  and  yg  on  the  L  curve,  and 
since  L{v)  lies  below  the  chord  between  Ug  and  on  the  L 
curve,  it  is  clear  that  Z,(w)  > /.(u).  In  other  words,  w  is  an 
optimal  partition  for  n  programs  and  will  maximize  processing 
efficiency. 

An  optimal  partition  for  n  -  1  programs  will  have  L(u)  > 
^(A'o)  ^  whether  it  produces  higher  processing  efficiency 
than  w  depends  on  whether  the  increase  in  L  offsets  the  de¬ 
crease  in  load.  A  partition  u'  for  fewer  than  n  -  1  programs 
will  have  L(u' )  —  L(u)-,  since  it  has  smaller  load  than  u,  it  is 
less  efficient.  It  is  not  difficult  to  see  that  a  partition  v'  for 
n  +  1  programs  (in  which  u,-  <yg)  has  L(u')  <  L{w)\  whether 
the  processing  efficiency  for  v'  exceeds  that  of  w  depends  on 
whether  the  effect  of  increased  load  offsets  that  of  decreased 
lifetime. 


DENNING  AND  GRAHAM:  MULTIPROGRAMMED  MEMORY  MANAGEMENT 


939 


The  point  is  that  the  partition  w  will  approximate  an  optimal 
partition  and  an  optimal  load.  It  remains  only  to  recall  that 
the  point  yo  is  approximately  the  mean  locality  size  of  the 
program,  which  can  be  approximated  by  a  working  set  policy 
for  a  wide  range  of  window  size.  (See  also  [21  ] .) 

References 

|ll  L.  A.  Belady,  “A  study  of  replacement  algorithms  for  virtual 
storage  computers,”  IBM  Syst.  J.,  vol.  S,  no.  2,  pp.  78-101, 
1966. 

(21  — ,  “Biased  replacement  algorithms  for  multiprogramming,” 
IBM  T.  J.  Watson  Res.  Cent.  Note  NC  697,  Mar.  1967. 

(31  L.  A.  Belady  and  C.  J.  Kuehner,  “Dynamic  space  sharing  in  com¬ 
puter  systems,”  Commun.  y4ss.  Comput.  Mach.,  vol.  S,  pp.  282- 
288,  May  1969. 

(41  L.  A.  Belady  and  R.  F.  Tsao,  “Memory  allocation  and  program 
behavior  under  multiprogramming,”  in  Proc.  Computer  Science  & 
Statistics:  7th  Annu.  Symp.  Interface  (Oct.  1973),  pp.  72-78. 

(51  A.  Brandwajn,  “A  model  of  a  time  sharing  virtual  memory  system 
solved  using  equivalence  and  decomposition  methods,”  y4cfa  In- 
formatica,  vol.  4,  pp.  11-47,  1974. 

(6l  J.  P.  Buzen,  “Computational  algorithms  for  closed  queueing  net¬ 
works  with  exponential  servers,”  Commun.  Ass.  Comput.  Mach., 
vol.  16,  pp.  527-531,  Sept.  1973. 

(71  D.  D.  Chamberlin,  S.  H.  Fuller,  and  L.  Y.  Liu,  “A  page  allocation 
strategy  for  multiprogramming  systems  with  virtual  memory,” 
IBM  T.  J.  Watson  Res.  Cent.  Rep.  RC  3848,  May  1972. 

[81  W.  W.  Chu  and  H.  Opderbeck,  “The  page  fault  frequency  replace¬ 
ment  algorithm,”  in  Proc.  AFIPS  Conf.  (1972  FJCC),  pp.  597- 
609. 

(91  E.  G.  Coffman,  Jr.,  and  T.  J.  Ryan,  Jr.,  “A  study  of  storage 
partitioning  using  a  mathematical  model  of  locality,”  Commun. 
Ass.  Comput.  Mach.,  \o\.  15,  pp.  185-190,  Mar.  1972. 

(101  E.G.  Coffman,  Jr.,  and  P.  J.  Denning,  Operating  Systems  Theory. 

Englewood  Cliffs,  N.J.:  Prentice-Hall,  1973,  chs.  6  and  7. 

(Ill  F.  J.  Corbato,  “A  paging  experiment  with  the  multics  system,” 
in  In  Honor  of  P.  M.  Morse,  K.  U.  Ingard,  Ed.  Cambridge,  Mass.: 
M.l.T.  Press,  1969,  pp.  217-228. 

(121  P.  J.  Denning,  “Resource  allocation  in  multiprocess  computer 
systems,”  M.l.T.  Project  MAC  Rep.  MAC-TR-50,  May  1968. 

(131  — .  “The  working  set  model  for  program  behavior,”  Commun. 

Ass.  Comput.  Mach.,  vol.  1  1 ,  pp.  32 3-333,  May  1968. 

(141  — ,  “Virtual  memory,”  Computing  Surveys,  vol.  2,  pp.  153-189, 
Sept.  1970. 

(151  P.  J.  Denning  and  S.  C.  Schwartz,  “Properties  of  the  working  set 
model,”  Commun.  Ass.  Comput.  Mach.,  vol.  15,  pp.  191-198, 
Mar.  1972. 


(161  P-  J-  Denning  and  J.  R.  Spirn,  “Dynamic  storage  partitioning,”  in 
Proc.  4th  ACM  Symp.  Operating  Systems  Principles  (Oct.  1973), 
pp.  74-79. 

(171  P.  J.  Denning,  “Comments  on  a  linear  paging  model,”  in  Proc. 
ACM  SIGMETRICS  Symp.  System  Performance  Evaluation 
(Oct.  1974). 

(181  W.  Doherty,  “Scheduling  TSS/360  for  responsiveness,”  in  Proc. 
AFIPS  Conf  (1 9  70  FJCC),  pp.  97- 1 1  1 . 

(191  D.  Ferrari,  “Improving  locality  by  critical  working  sets,”  Commun. 
Ass.  Comput.  Mach.,  vol.  17,  pp.  614-620,  Nov.  1974. 

(201  M.  H.  Fogel,  “The  VMOS  paging  algorithm,”  in  ACM  SIGOPS 
Operating  Syst.  Rev.,  vol.  8,  pp.  8-17,  Jan.  1974. 

(211  Z.  Ghanem,  “The  lifetime  function  shape  and  the  optimal 
memory  allocation,”  IBM  .T.  J.  Watson  Res.  Cent.  Rep.,  Sept. 
1973. 

(221  M.  Z.  Ghanem  and  H.  Kobayashi,  “A  parametric  representation 
of  program  behavior  in  a  virtual  memory,”  IBM  T.  J.  Watson  Res. 
Cent.  Rep.,  Sept.  1973. 

(231  D.  J.  Hatfield  and  J.  Gerald,  “Program  restructuring  for  virtual 
memory ,”  IBM  Syst.  J.,  vol.  10,  pp.  168-192,  1971. 

(241  R.  L.  Mattson,  D.  Slutz,  I.  Traiger,  and  J.  Gecsei,  “Evaluation 
techniques  for  storage  hierarchies,”  IBM  Syst.  J.,  vol.  9,  pp.  78- 
101, 1970. 

(25  I  J.  B.  Morris,  “Demand  paging  through  utilization  of  working  sets 
on  the  MANIAC  II,”  Commun.  Ass.  Comput.  Mach.,  vol.  15,  pp. 
867-872,  Oct.  1972. 

(26]  R.  R.  Muntz,  “Analytic  modeling  of  interactive  systems,”  this 
issue,  pp.  946-953. 

(27l  H.  Opderbeck  and  W.  W.  Chu,  “Performance  of  the  page  fault 
frequency  algorithm  in  a  multiprogramming  ivironment,”  in 
Proc.  IFIP  Congress  1 9  74. 

(281  B.  G.  Pricve,  “Page  partition  replacement  algorithm,”  Ph.D. 
dissertation,  Dep.  Elec.  Eng.  and  Comput.  Sci.,  Univ.  of  California, 
Berkeley,  Dec.  1973. 

(291  B.  G.  Prieve  and  R.  S.  Fabry,  “Evaluation  of  a  page  partition 
replacement  algorithm,”  Bell  Lab.,  Naperville,  Ill.,  Tech.  Rep., 
Oct.  1973. 

[30]  — ,  “An  optimal  variable  space  page  replacement  algorithm,”  Bell 
Lab.,  Naperville,  111.,  Tech.  Rep.,  May  1974. 

(31 1  J.  Rodriguez-Rosell,  “Experimental  data  on  how  program  be¬ 
havior  affects  the  choice  of  scheduler  parameters,”  in  Proc.  3rd 
ACM  Symp.  Operating  System  Principles  (Oct.  1971),  pp. 
156-163. 

[321  J-  Rodriguez-Rosell  and  J.-P.  Dupuy,  “The  design,  implementation 
and  evaluation  of  a  working  set  dispatcher,”  Commun.  Ass. 
Comput.  Mach.,  vol.  16,  pp.  247-253,  Apr.  1973. 

[33]  T.  A.  Ryan,  Jr.,  and  E.  G.  Coffman,  Jr.,  “A  problem  in  multi- 
programmed  storage  allocation,” /£■£■£■  Trans.  Comput.,  vol.  C-23, 
pp.  11 16-1 122,  Nov.  1974. 

[34]  J.  H.  Saltzer,  “A  simple  linear  model  of  demand  paging  perfor¬ 
mance,”  Commun.  Ass.  Comput.  Mach.,  vol.  17,  pp.  181-185, 
Apr.  1974. 

[  35  ]  D.  Sayre,  “Is  automatic  ‘folding’  of  programs  efficient  enough  to 
displace  manual?”  Commun.  ASs.  Comput.  Mach.,  vol.  12,  pp. 
656-660,  Dec.  1969. 

[36]  A.  Shaw,  The  Logical  Design  of  Operating  Systems.  Englewood 
Cliffs,  N.J.:  Prentice-Hall,  1974. 

[37]  D.  R.  Slutz  and  I.  Traiger,  “A  note  on  the  calculation  of  average 
working  set  size,”  Commun.  Ass.  Comput.  Mach.,  vol.  17,  pp. 
563-565,  Oct.  1974. 

[38]  J.  R.  Spirn,  “Program  locality  and  dynamic  memory  manage¬ 
ment,”  Ph.D.  dissertation,  Dep.  Elec.  Eng.,  Princeton  Univ., 
Princeton,  N.J.,  Mar.  1973. 

[39]  — ,  “A  model  for  dynamic  memory  allocation  in  a  paging 
machine,”  in  Proc.  8th  Princeton  Conf.  (Mar.  1974). 

[401  R-  Spirn,  P.  J.  Denning,  and  J.  E.  Savage,  “Models  for  locality 
in  program  behavior,”  to  be  published  in  Acta  Informatica,  1975. 

[41)  N.  Weizer  and  G.  Oppenheimer,  “Virtual  memory  management 
in  a  paging  environment,”  in  Proc.  AFIPS  Conf.  (7  969  SJCC),  pp. 
2  34  ff . 

[42]  M.  V.  Wilkes,  “The  dynamics  of  paging,”  Comput.  J,,  vol.  16, 
pp.  4-9,  Feb.  1 973. 


92 


CHAPTER  THREE 


EXPERIMENTAL  INVESTIGATION  OF  MEMORY 
POLICIES 


lil  INTRO  DUCTION 


Oar  purpose  in 
behaviour  using  empir 
Specifically,  we  have 
model  of  program  beha 
important  performance 
memory  policy  con trolle 
in  the  following  sectio 
appears  next;  finally, 
(b) ,  and  (c)  are  descri 


this 
i  cal 
investigated  a) 
viour,  b)  the 
measures,  and 
rs.  An  overview 


chapter  is  to  study  prog 
reference  string  trace  da 
the  phase/transit 
relationships  am 
c)  the  criteria 
of  each  area  appe 
n;  the  methodology  of  the  experime 
the  results  of  investigations  ( 
bed  in  detail  in  individual  sectio 


ram 

ta. 

ion 

ong 

for 

ars 

nts 

a)  r 

ns. 


3.1.1 


The 


Phase/Transition  Model 


of  Program  Behaviour 


The  phase/transition  model  of  program  behaviour  was 
described  in  Section  1.1,  where  we  commented  on  the  page 
reference  maps  of  Hatfield  and  Gerald  [HI],  and  again  in 
Section  2.4.5  as  an  example  of  a  semi-Markov  model  of 
program  behaviour.  The  phase/transition  model  employs  a 
macromodel  to  specify  the  intervals  of  constant  memory 
demand  (phases),  separated  by  transitions;  it  employs  a 
microffiodel  to  specify  the  detailed  reference  pattern  within 
the  locality  set  of  a  phase.  Within  this  characterization, 
we  can  determine  the  resource  demands  of  a  particular 
phase/transition  model  and  investigate  memory  performance 
measures. 


Our  purpose  here  is  to  interpret  empirical  reference 
string  trace  data  in  terms  of  the  phase/transition  model  of 
program  behaviour.  We  investigate  whether  certain  model 
predictions  (hypotheses)  are  supported  by  the  data,  and  we 
also  consider  certain  model  features  not  directly  related  to 
the  data.  Our  results  are  of  three  types.  First,  we 
attempt  to  determine  the  relation  between  lifetime  curve 
features  (knees)  and  locality  set  sizes,  and  whether  these 
features  correspond  to  nested  or  disjoint  phases.  Second, 
we  investigate  the  dominance  of  the  DWS  lifetime  curve  at 
its  primary  knee:  we  study  the  hypothesis  that,  when  all 
ncnlookahead  memory  policies  are  constrained  to  operate  at 
the  mean  resident  set  size  producing  the  knee  of  the  DWS 
lifetime  curve,  they  will  achieve  smaller  lifetime  values 
than  DWS  at  that  resident  set  size.  We  also  study  the 
hypothesis  that  the  DWS  lifetime  knee  slope  (the  ratio  of 
lifetime  value  to  resident  set  size  at  the  primary  knee) 


93 


dominates  the  knee  lifetime  slopes  of  other  nonlookahead 
memory  policies.  Third,  we  investigate  the  errors  in 
locality  set  estimation  inherent  in  the  DWS  and  PPF 
policies. 

Our  experiments  directly  support  a  number  of  major 
hypotheses  predicted  by  the  phase/ transit ion  model.  That 
these  hypotheses  are  supported  raises  significantly  our 
confidence  both  in  the  utility  of  this  model  and  in  some  of 
the  model  predictions  we  have  not  been  able  to  measure. 
However,  these  experiments  rule  out  neither  the  possibility 
that  future  experiments  will  reject  certain  model 
hypotheses,  nor  the  possibility  that  other  models  will  be 
found  equally  as  effective. 

3.1.2  Relationships  among  Performance  Measures 

The  throughput  of  a  wide  class  of  systems  is  a 
fundamental  performance  measure  because,  when  maximized,  it 
implies  that: 

-  work  capacity  is  maximum, 

-  CPU  utilization  is  maximum, 

-  the  average  space-time  per  job  is  minimum,  and 

-  response  time  is  minimum. 

These  relations  will  be  proved  in  Section  3.4.  We  have 
observed  strong  relations  between  lifetime  curve  knees  and 
space-time  minima  for  individual  programs  under  various 
memory  policies,  suggesting  that  lifetime  curve  features  may 
be  exploited  to  maximize  the  work  capacity  of  a  system. 
Using  simple  queueing  networks,  we  have  studied  the  effect 
of  several  jobs  in  the  systems;  we  have  indeed  verified  that 
operating  the  DWS  and  PFF  memory  policies  near  their  primary 
knees  tends  to  cause  near  maximal  system  throughput. 


3.1.3  Criteria  for  Memory  Policy  Controllers 


In  ord 

er 

to 

avoid  an  ov 

erco 

mm 

it  ment 

of 

ma 

in 

mem 

ory 

and 

thrashing. 

a 

load 

control  pol 

icy 

is 

necessa  ry 

• 

A 

s  no 

ted 

in 

Section  2. 

2. 

3, 

it  is  desira 

ble 

to 

in  teg 

rate 

a 

1 

oad 

c  ont 

rol 

and 

memory 

ma 

nage 

ment  policy 

in  o 

ne 

desig 

n  (e 

•  g 

•  t 

the 

Cl 

ass 

V3 

memory 

P 

olic 

ies) .  Load 

(le 

ve 

1  of  m 

ulti 

programra 

ing) 

is 

controlled 

in 

sue 

h  policies  b 

y  pa 

c  k 

ing  ma 

in  mem 

or 

y  a 

s  f 

ull 

as 

possible 

wit 

h  estimated 

loc 

al 

it  y  se 

ts. 

T 

he 

load  ri 

ses 

94 


I 


and  falls  as  the  estimated  locality  sets  of  active  programs 
shrink  and  grow. 


An 

impor 

tant  fact 

or 

in 

se 

the  ove 

rhead 

of  the  pol 

ic 

y- 

Ove 

activat 

ion  or 

deac  ti va t 

io 

n  de 

cis 

memory 

policy 

parameter 

a 

d  jus 

tme 

of  type  (a) 

de  pends 

o 

n  ho 

w  t 

vary  in 

s  ize. 

and  is  ea 

si 

ly  c 

ont 

on  the 

size 

of  the  poo 

1  of 

un 

overhead  of  t 

ype  (b)  de 

pe 

nds 

on 

memory 

polic 

y  can  be 

ad  j  u 

ste 

program 

loca  1 

i ty ,  Our 

e  X 

peri 

men 

minimum 

poss 

ibie  type 

(b) 

ov 

details 

of  an 

implemen  tat 

ion ; 

we 

possible  ove 

rhead  over 

all 

i  rap 

seeki ng 

a  nswe 

rs  to  the 

qu 

es  ti 

ons 

lecting  a  Class  V 
rhead  is  genera 
ions  (changing  lo 
nts.  The  amount 
he  totality  of  re 
rolled  by  imposin 
used  memory  pages 
the  ease  with 
d  to  give  a  good 
ts  focus  on  eval 
erhead  without 
can  characterize 
lementa tions.  He 


3  policy  is 
ted  by  a) 
ad)  ,  and  b) 
of  overhead 
sident  sets 
g  a  minimum 
fOl  ].  The 
which  the 
estimate  of 
uating  the 
knowing  the 
the  best 
do  this  by 


1. 


How  often  is  a  policy  pa 
keep  operating  near  the  1 
the  prior  section  that 
operating  point)? 


rame  te  r 
if etirae 
t  his 


adjustment  needed  to 
knee  (assuming  from 
is  a  near  optimal 


2.  What  is  the  least  number  of  distinct  parameter  values 
that  arise  across  the  set  of  programs  when  operating 
within  specified  tolerances  of  the  knee? 

3.  Does  the  operating  point  (x,LT(x))  change 
roonotonically  and  continuously  with  respect  to 
parameter  changes? 


Our  examination  of  the 
that  DWS  and  PFF  have  strikin 
DWS  having  significantly 
measured  tor  question  (2) .  K 
exhibit  instabilities,  mani 
behaviour  as  measured  for  que 


data  in  Section  3.5  will  show 
gly  different  characteristics, 
lower  intrinsic  overhead  as 
oreover,  PFF  is  observed  to 
fested  as  nonmonotone  and  gap 
stion  (3)  , 


I 


I 

i 

I 

! 


i 

i 


I 


U  EH 


95 


3.2  METHODOLOGY 


Six 

IBM  Thoma 
tapes;  t 
were  used 
addre  ss 
program  b 
and  were 
riter  ia , 
hese  tra 


programs,  traced  during  the  summer  of  1973  'at  the 
s  J.  i^atson  Research  Center,  produced  eight  trace 
hey  are  summarized  in  Table  3.2.1.  Two  criteria 
to  select  programs  for  tracing  (recording  the 
sequence) .  First,  they  represented  a  range  of 
ehaviours.  Second,  the  programs  were  heavily-used 
a  substantial  load  on  the  system.  Beyond  these 
no  attempt  was  made  to  select  ’’typical"  programs, 
ce  tapes  have  also  been  used  in  other  work  [  G4  ], 


E  a  c  ii 

execution 
for  data 
2«10^  ins 
d ist inct 
2X  refere 
of  new 
second  X 


proq  ram 

wa  s 

t 

raced 

un  t  i 

,  fillin 

g  an  o 

ut 

put  t 

race 

ils.)  A 

full 

trace 

t  ap 

truct ion 

s  or  4 

•  1 

0^  pa 

ge  re 

pages 

refere 

nc 

ed  in 

the 

nces  is 

given 

in 

the 

table 

pages  a 

re  ref 

er 

enced 

for 

references. 

1  (nea 

rly)  the 

end 

of 

its 

ta  pe . 

(See 

Appe 

ndix  A 

e  CO  nt 

ained 

ap 

prox 

imately 

ferences,  T 

he 

num 

her 

of 

first 

X=649, 

984  an 

d  fi 

irst 

.  A 

signif 

icant 

number 

the  fi 

rst  ti 

me 

d  uring 

the 

The  length  of  the  reference  s 
reduction  programs  is  an  impo 
experiments.  The  length  used 
experiments  to  be  described  was 
Some  exploratory  work  was  per 
figure.  We  are  interested  in 
behaviour  of  lifetime  curve  k 
space-time  curves.  Several  progr 
DWS  policy  to  generate  their 
reference  string  segments  of  leng 
and  4X.  Figure  3.2.1  exhibits  a 
y2  operating  points  (Section  2.1. 
are  comparable  for  the  lenght 
corresponding  features  for  the  le 
8e  concluded: 


triag  processed  by  the  data 
rtant  parameter  of  the 
uniformly  throughout  the 
2X=1,299,968  references, 
formed  to  arrive  at  this 
the  relatively  stable 
nees  and  of  the  minima  of 
ams  were  processed  by  the 
lifetime  curves  for  initial 
th  X  (=649,984),  2X,  3X, 

typical  result.  The  y1  and 
3)  and  the  knee  window  size 
s  2X,  3X,  and  4X,  but  the 

ngth  X  are  not  comparable. 


1.  The  reference  string 
(=1,299,968)  in  order 
size  values  at  the  y2  points) 
insensitive  to  experiment  length. 


segment  should  be  at  least  2X 
that  the  y2  points  (and  window 

be  relatively 


2.  We  chose  a  length 
collection  down. 


of  2X  to  keep  costs  in  data 


3 


The  knee 

slope 

(LT 

(y2)/y2)  do 

es  depend 

on 

the 

ex  per iment 

length . 

A 

ny  results 

depending 

on 

the 

numerical 

val  ue 

of 

the  slope 

must  be  ' 

properly 

interpret  ed 

(e .  g,  , 

the 

space-time 

or  queueing 

network 

answers) , 


96 


Sir. 

Source  Ckaracteruticr 

X=  641,  IXH.  1 

#  of 

dertmet 

pa^ec 

m  A 
ref. 

*  of 
diHioct 

re 

fef. 

/if€tc»ixe 

t-K  SLX 
p«fettAcfcs 

Pi 

TSS  Asjcmikr  asre'»^iffAQ  a-K 

error* free  TRAfI  source  ^odale 
of  about’  10  priAter  “tKe 

macro  pbafe  crory-re^ptAce 

pkast  were  ako  usei 

144 

nr 

7,4a? 

pa 

cOAtinuatiofv.  ot  ref.  5tr.  ?i 

S-4 

31 

/fe,  041 

n 

TSS  ?t/^  copr>pi(er^  coixpiliAG  oi 
projra^  of  a  boot  3  prenteT 
paees^  witk  30  iv^tox  errors’ 
OAtI  TLO  COwvT>iUr  apt\QAS 

IJ1 

ao4 

4,37  A 

TSS  F»rtraA  cOj^pjler^  cem.>iiiAa 
ax  error- fre€  brbaraT^  of  ^hoit 
^  printer  >cLGes 

ISTH 

<51 

?s 

TS5  Redltt^  tke  Rwarck  edifpr 

^vrinj  a  iyPicsJ.  cexsioK 

131 

9,3ia 

H, 

ACj  a  TSS  Till  tnt€facfcv€ 
projra'm  c'j^piementi^q  BeUly’y 
Activify  Co'jrst  icteoi  ^ 

ia4. 

ia4 

/0,3l'7 

P7 

0R8fT^  a.  NASfi  Lewis  prc^raiA 
wnttex  IK  tke  CiMp’  fansri;aj6^ 
to  do  orbital  calculations 

aoH 

am 

^,a43 

Pi 

coAtixuatiox  of  ref.  str.  ?7 

34  1 

bi 

Table  3.2.1  Comments  on  the  programs  traced 


i 


figure  3.2.1  Effect  of  reference  string  length  on  the 

_  nurc  ^  ^ 

-  uno  r±  AiietJ-Hie  - —  - . - 

-----  .  -  .  . 

1  .  .... 

i  i  :  ; 

- - - - - -  -  . .  . . _  . . 

98 


The  sequence  of  data  reduction  programs  used  to  derive 
the  desired  performance  measures  of  lifetime  function  and 
space-time  cost  is  outlined  in  Table  3.2.2  and  illustrated 
in  Figure  3.2.2.  The  details  of  the  data  reduction  programs 
are  in  Appendix  R.  Table  3,2.2  also  compares  the  relative 
average  costs  of  executing  these  programs  for  2X  references, 
VMTN  is  cheaper  because  no  space-time  calculations  were 
performed  for  it. 

Lifetime  functions  and  space-time  costs  were  measured 
for  two  fixed  partition  (OPT^LfiU)  and  three  variable 
partition  (VMIN,DWS,PFF)  memory  management  policies.  LRU 
was  chosen  as  being  representative  of  nonlookahead  fixed 
partition  policies;  it  is  also  known  to  be  the  most  robust 
such  policy  [B6].  OPT  was  chosen  because  it  is  the  best 
possible  fixed  partition  policy  fR6,M2J.  DWS  was  chosen  as 
being  representative  of  robust  nonlookahead  variable 
partition  policies  [D3],  The  interest  in  PFF  is  its  simpler 
implementation  than  DWS;  the  experiments  of  [C2]  show  that 
DWS  and  PFF  are  comparable  for  certain  measures  of 
performance,  VfllN  was  chosen  because  it  is  the  best 
possible  variable  £artitipn  policy  when  lifetime  function  is 
the  performance  measure,  and  is  cheaply  measured  [D16], 
Thus,  for  each  reference  string,  five  lifetime  functions 
were  computed. 


E  xact 
policies 
corapu  ted 
measured 
used  to 
the  direc 
A  direct 
space-t im 
space-t im 
a  pproxima 


space-time 
except  VMIN, 
for  DWS. 
because  the 
determine  t 
t  calculatio 
simulation 
e  product, 
e  cost  bee 
tion  of  VMIN 


i 

h 

n 

I 

a 


c  ost 

s 

we 

re 

compu 

ted  for 

a 

11 

me 

a  nd 

a 

n 

app 

rox 

ima  te 

space- ti 

me 

cost 

(Th 

e 

V 

MIN 

s 

pace-t 

ime  prod 

uc 

t  1 

^as 

rite 

rr 

efere 

nee 

inter 

val  coun 

ting 

me 

e  V 

MI 

N 

lif 

eti 

me  function  di 

d 

not  a 

of 

t 

he 

space 

-time 

produc  t 

[ 

D6, 

rDl 

of 

VM 

IN 

wo 

uld 

be  required  f 

or 

the 

t  was  decided  not  to  measure 
use  VMIN  is  not  i mplementable, 
space-time  cost  is  discussed  below 


mory 
was 
not 
t  hod 
Ilow 
6].) 
VMIN 
VMIN 
An 


All  lifetime  and 
in  the  Appendices, 
tables  in  the  text. 


space-time  data  for 
The  key  features 


the  trace  tapes  are 
are  summarized  in 


The  following  approximation  for  the  space-time  cost  of  a 
program  executing  under  a  variable  partition  policy  with  a 
mean  memory  allocation  of  x'  pages  was  given  in  Section 

2.1.3: 


ST*  (x* ) 


L«A1»x‘*(1  ♦  A.F(x*))  , 


where  L  is  the  reference 
memory  access  speed,  A  is  the 
and  auxiliary  memory,  and  F  is 
This  formula  is  exact  only 
sampled  at  page  fault  times 


string  length,  A1  is  the  main 
access  time  ratio  between  main 
the  page  fault  rate  function, 
when  the  resident  set  size 
is  the  observed  mean  resident 


99 


?R06Rftfl 

COST 

TXAM 

iloftitor  Xatject 
aeAe^flie  oSdrtsi 

— 

STRIP 

Reduce  address  tract  hy 
d(YMtivatiA(i  t»vfbc'Ma.tio>v 
irreleuaAt  io  tK<$  siifdy 

$33 

LRU/ OPT 

Calculate  LRU  a*>vd  OPT  /iTEttme 
and  space- tme  curves 

$12 

Dws 

Calculate  PWS  iiietime  a^usi 
Spac€-tim<  curves  at  io 
selected  windows 

$17 

PFF 

Calculate  PFF  iifeti*^e  cenA 
Space -ti^e  curves  at  13. 
selected  tkres^^lds 

$2H- 

VMIW 

Calculate  VlilM  /jfeti«Re  curve  1^7 
at  io  selected  u/iAdovJ.S 

- — - 1 _ f 

Table  3.2.2  Relative  average  costs  of  the  data  reduction 
programs 


100 


r-  s  h 


pt 


DWS 

?FF 

VMIN 

Figure  3.2.2  Sequencing  of  data  reduction  program 
execution  and  tape  creation 


101 


set  size.  To  test  the  approximation,  exact  and  approximate 
space— time  curves  were  computed  for  the  DWS  policy  on  every 
reference  string.  A  typical  result  is  illustrated  in  Figure 
3.2.3  for  reference  string  P3.  Here,  the  exact  DWS  space¬ 
time  curve  has  two  local  minima,  whereas  the  approximate 
curve  has  only  one  minimum.  The  error  made  by  the 
approximation  is  large  {reaching  20  percent)  and  not 
consistently  excessive  or  deficient.  (T  his  corroborates 
observations  reported  in  [S14].)  We  conclude  that  the 
approximate  space-time  formula  is  not  a  reliable  predictor 
of  actual  space-time  cost  tor  variable  partition  memory 
policies,  and  that  arguments  based  on  it  must  be 
corroborated  by  other  methods. 


The  VMIN  space-time  cost  can  be  estimated  from  the  DWS 
space-time  cost.  Both  V^IN  and  DWS  have  identical  page 
fault  seguences  [D6,D16].  The  resident  sets  are  related 
(approximately)  by  s  (T) -v  (T)  =  (T- 1 )  "m  (T)  ,  Thus,  the 
approximate  space-time  cost  is: 


ST*  (T) 


A1«L«S  (T)  •  (1 


A  1*L*v  (T) •  (1 


+ 


A*m  (T)  ) 
A«m  (T)  ) 


for  DWS 
for  VMIN 


and  the  difference  between  DWS  and 
DIFF  =  A1*L*  (T-1  )  •m  (T)  •  (1  + 


VI1IN  is  approximately 
A*m(T)) 


Note  that  the  estimate  of  VKTN  space-time  cost  is  reliable 
to  the  same  extent  as  that  of  the  DWS  space-time  cost. 

To  obtain  each  desired  performance  measure,  the  DWS  and 
VMIN  analyzers  (Figure  3.2.2)  were  run  with  a  set  of  window 
sizes  (usually  about  ten)  the  same  for  all  reference 
strings.  Similarly,  the  PFF  analyzer  used  a  set  of 
threshold  sizes  the  same  for  all  reference  strings  (see 
Appendix  B)  .  In  some  cases,  additional  parameter  values 
were  selected  in  order  to  obtain  finer  resolution  of  the 
performance  measures  (see  Appendix  C  and  Appendix  D)  , 


3i3  A  STUDY  OF  THE  P H ASE/TMIlSITI^  MODEL 


Many  of  the  results  of  this  section  come  from 
predictions  or  properties  of  the  phase/transition  model  of 
program  behaviour.  One  of  the  important  points  is  that  a 
good  deal  of  information  about  phase  behaviour  is  contained 
in  the  lifetime  curves,  especially  the  DWS  lifetime  curves. 
This  means  that  the  lifetime  curve  contains  enough 
information  to  allow  optimal  load  control  in  a  wide  class  of 
syste  ms. 


space-time  cost 
(I0*pa3e  ^sec} 


4.? 

HZ 

4.0 

3g 

3^ 

3.4 

U 

Iv 

2g 

2^ 

VH 

lA 


DWS* 


Medx  resicle^ 
set  si^e  (pages) 


io  ^0  3o  ‘to  50  40  70  ^  <?o  iio  120 


Figure  3.2.3  Exact  and  approximate  DWS  space-time  costs 
- -  for  P3 


103 


Four  basic  (and  idealized)  properties  of  the 

phase/transition  model  are: 

1.  There  is  an  interval  D  such  that  whenever  (t-D,t)  is 

contained  in  a  phase,  W(t,D)  is  the  locality  set  of  that 
phase.  The  smallest  such  D  for  a  given  set  of  phases  and 
transitions  is  called  the  minimal  observation  interval 
with  respect  to  the  phases.  (See  for  reference 

maps  giving  evidence  of  this  property.) 

2.  Phases  may  contain  subphases  and  transitions.  Thus,  if 

D1  is  a  minimal  observation  interval,  there  may  exist  a 
D2  <  D1  such  that  D2  is  a  minimal  observation  interval 
for  subphasos  (over  locality  subsets).  Then  D2  is  said 

to  observe  an  inner  level  of  phase  behaviour  with  respect 

to  the  phases  observed  by  Dl.  (See  [Ml]  for  direct 
evidence  of  this  nesting.) 

3.  When  T=D  and  D  is  a  minimal  observation  interval  for  some 

level  of  phase  behaviour,  W(t,T)  is  a  minima lly 

sufficient  ostiraator  of  locality.  Tn  this  case, 
m  (T)  =  F/H ,  where  E  is  the  mean  number  of  pages  entering 
the  locality  set  at  a  transition  and  H  is  the  mean  phase 
holding  time.  (See  [Dll]  for  evidence  of  this  property.) 
Note  that  property  (1)  implies  that  page  faults  are 
possible  only  if  (t-D,t)  is  not  contained  in  a  phase. 

4.  Suppose  Z  (t)  is  the  resident  set  of  an  arbitrary 

nonlookahead  memory  policy  MP.  With  respect  to  the  DWS 

policy,  the  excess  EX  (t)  is  the  set  of  pages  in  Z(t)  but 

not  in  W(t,T)  ;  the  deficiency  DE  (t )  is  the  set  of  pages 
in  W(t,T)  but  not  in  Z(t).  When  W(t,T)  is  minimally 
sufficient,  there  will  be  more  references  into  the  sets 
DE{t)  than  into  the  sets  £X(t).  This  imples  that  a)  a 
unit  increase  in  W(t,T)  changes  the  lifetime  LT  less  than 
a  unit  decrease,  implying  a  LT  knee  near  a  minimally 
sufficent  T,  and  b)  policy  MP  cannot  simultaneously 
generate  a  smaller  resident  set  size  and  page  faulting 
rate  than  DWS  -  i.e.,  its  lifetime  at  the  DWS  knee  x  is 
smaller  than  the  DWS  lifetime  at  x.  (See  [D18]  for  a 
discussion.)  Property  (b)  is  to  be  regarded  as  a 
prediction  of  the  phase/transition  model  and  must  be 
verified  by  experiment.  It  cannot  be  proved  absolutely. 

The  multiple  knee  behaviour  of  lifetime  curves,  the 
dominance  of  DWS  knees,  and  the  irremovable  overshoot  of 
memory  policies  are  discussed  next,  in  terms  of  these 
properties  of  the  phase/transition  model. 

3.3.1  Multiple  Knee  Behaviour 

A  is  a  point  x  on  the  lifetime  curve  LT  at  which 

the  ratio  g (x)  =  LT(x)/x  attains  a  local  maximum.  The 


104 


p rimary  Jinee  is  the  knee  of  highest  g  (x)  value.  Denote  the 
resident  set  size  here  by  x1  and  the  lifetime  value  by 
LT  (x 1 ) .  The  secgndarx  knee  is  the  knee  of  second  highest 
g  (x)  value.  Denote  the  resident  set  size  here  by  x2  and  the 
lifetime  value  by  LT(x2) .  The  level  of  a  knee  is  its  index 
in  the  above  scheme:  x1,x2,x3,...  ,  The  k nee  slope  is  the 
value  of  g (X)  at  x=x1,x2,...  ;see  Figure  3. 3. 1.1. 

According  to  the  foregoing  properties  of  the 
phase/transition  model,  knees  are  related  to  levels  of  phase 
behaviour,  and  the  levels  of  knees  to  the  relative  strengths 
of  phase  levels.  The  phase/transition  model  thus  accounts 
for  multiple  knees. 


W ith i n 

t  he 

f ramewor 

t  he 

nature 

of  the 

underly 

knee 

s  may 

ar  ise 

either  f 

phases,  ove 

r  loca 

1  it y  set 

case 

is  no 

ted  ab 

ove  and 

V  ant 

ilborgh 

[C9]. 

)  A  disc 

u  nde 

r lying 

nature 

is  poss 

DWS 

1 if et im 

e  curve.  Let 

set 

sizes. 

with 

K1<K2, 

sizes.  Def 

ine  R|^ 

;  =  K2/K1 

d  is  j 

oint , 

a  sim 

pie  mag 

roovi 

ng  from 

the 

smaller 

If  phases  ar 

of  inner  and  oute 

r  phases 

this 

,  we 

dec ide 

d  to  di 

implies  nesting,  otherwis 


k  o 

f  th 

e 

mode  1, 

we  can 

in  ves  tig 

ate 

ing 

loca 

lity  set 

process 

.  M  u  It  i 

pie 

rom 

nest 

ed 

phases 

,  or  from  disjo 

int 

s  of 

dif 

fe 

rent  si 

zes.  (The  for 

mer 

by  C 

Ml  1, 

t 

he  latt 

er  by  C 

our tois 

and 

rimi 

nat  i 

on 

test 

to  determine 

the 

ible 

by 

ob 

serving 

proper 

ties  of 

the 

K1  a 

nd  K 

2 

be  adjacent  kn 

ee  resid 

eat 

and 

T1<T2 

be  the 

a  ssoci 

ated  win 

do  w 

and 

bt 

= 

T2/T1 . 

If  t he 

phases 

are 

nif ica  t i 

on 

of  window 

would  al 

low 

to 

t  h 

e 

large  r 

knee 

-  tha  t 

is. 

e  nested,  the  different  time  scales 
would  force  R|^  <<  R-j-.  Based  on 

scriminate  as  follows:  <0.5 

e  disjointness. 


T 

a  ble 

3. 

3. 

1.1s 

hows 

trace 

tapes 

e 

xh 

ibited  one 

[oper 

at  ing 

po 

in 

ts  wi 

th  th 

and  t 

wo  te 

rt 

ia 

ry  k 

nees. 

exhib 

ited  f 

iv 

e 

sec  on 

dar  y 

inter 

est ing 

t 

o 

note 

that 

on  t h 

e  one 

re 

f  e 

re  nee 

stri 

that  under  the  DWS  policy  the  eight 
instance  of  two  primary  knees 
e  same  slope),  six  secondary  knees. 
Table  3, 3. 1.2  shows  that  PFF 
knees  and  one  tertiary  knee.  It  is 
DWS  did  not  exhibit  a  tertiary  knee 
ng  (P7)  for  which  PFF  did. 


Table  3.3. 1.3  shows  the 
experiment.  There  are  no 
approximately  equal  to  1 
behaviour  is  a  more  likely 
multiple  knees.  Although 
Madison  and  Batson  observed 


data  for  the  underlying  locality 
observations  of  /Rg*  being 

which  indicates  that  nested 
explanation  as  the  cause  of 
working  with  different  data, 
similar  results  [Ml]. 


I 

t  s 

hould  be  no 

same 

ope 

rating  poin 

compa 

rable.  Typical 

twice 

as 

large  as 

refer 

ence 

string  PI 

knee 

operating  point 

t  he 

PFF 

threshold 

ted 

t  y 

lYr 
the 
,  fo 
but 
wind 


that,  when  DWS  and  PFF  produced  the 
their  parameter  values  were  not 
the  DWS  window  size  was  at  least 
PFF  threshold  window  size.  On 
r  example,  DWS  and  PFF  had  the  same 
the  DWS  window  size  was  125,500  and 
ow  size  was  39,800.  This  is  not 


105 


Figure  3. 3. 1.1  Multiple  knee  features 


Ref. 

Str. 

Knee  fecto^^s  (rejide^t  set  SiHy  u/mclow  si^e) 

Icv^  (Civee 

level  ICiyee 

1 3**^  level 

Pt 

so 

lasr.foo 

3b 

13,100 

- 

- 

Pa 

SI 

W,Hoo 

- 

- 

• 

- 

P3 

41 

132,300 

- 

- 

PH 

Hj%0 

SI 

(ao,soo 

- 

- 

PS" 

u 

n 

lAOjHoo 

li>jOOO 

- 

- 

tS" 

1J1>oo 

Pt 

ao 

()3jgOO 

li 

IbjSOO 

-  ■ 

P7 

IS‘^,30P 

131  j35»,4oo 

m 

31 

311,000 

3H 

m,200 

31 

¥1,300 

Table  3. 3. 1.1  DWS  multiple  knees 


l?ef 

Sir. 

khzt  ‘t^^€^kolcl) 

level  fc^'hee 

A'®  level  Uee 

3''“  level  Jc-nte 

PI 

So 

31,300 

38 

cr 

-  - 

PA 

6o 

Aro,ooo 

- 

- 

- 

- 

PJ 

sr 

l3,iDO 

- 

- 

- 

- 

P4 

VO 

%100 

S4 

%0D0 

- 

- 

pr 

av 

(fjiOO 

- 

- 

- 

-■ 

n 

3‘i 

S'jOOD 

rt 

500 

- 

- 

P7 

IDS' 

31,100 

?7 

30.000 

<1700 

p? 

31 

aso^ooo 

34 

Ijaoo 

- 

- 

Table  3. 3. 1.2  PFF  multiple  knees 


108 


Resident 

Sti  siui 

U/llaAo^^) 

siiej 

fet. 

K'net 

lCT\tt 

Rati  0 

Ratio 

Ratio 

U/|*l<jloW 

Uimdow 

Str- 

A 

B 

■Rk>1 

KkIKt 

1?T>1 

A 

n 

?1 

SO 

3t 

1.34 

.sa 

a.45 

I3ZSD0 

4?,‘K)o 

Pa 

57 

- 

- 

- 

- 

417,400 

- 

?3 

HS 

1.4a 

.57 

a.4g 

49,300 

111,300 

PH 

3S 

Si 

1.4fc 

.53 

4.4? 

a4,9O0 

13.0,500 

PC 

n 

S.oo 

.It 

30.fco 

489>00 

14^000 

TO 

ss 

‘^a 

loi 

■SO 

i.07 

9M,400 

120,400 

P4 

ao 

ii 

1.47 

■HO 

4.17 

4S,?00 

16,500 

P7 

<1S 

131 

i-3g 

llE 

154,300 

351,  too 

PS 

34 

i.ir 

a.83 

349,000 

123,200 

31 

1-14 

.1? 

7.01 

3ft  ooo 

99,200 

Table  3. 3. 1.3  Investigation  of  phase  structure 


109 


s 

urpr 

isin 

g  bee 

ause 

t 

he  PF 

o 

n  wi 

ndo  w 

size 

• 

F 

urt  h 

er  e 

xa  mi 

na 

tion 

r 

ela  t 

ed  c 

onclu 

si  on 

* 

Ther 

k 

nees 

CO 

u  Id 

be 

ra 

n  ked 

r 

esid 

cnt 

set  s 

ize 

(e 

w 

as  n 

o  t  a 

1  wa  ys 

the 

P 

uimar 

o 

ur  c 

onte 

nt  ion 

t  ha 

t 

we  ha 

b 

eha  V 

iou  r 

i  n 

o 

ur 

ref 

1 

ocal 

it  ie 

s  of 

di  f  f  er 

en  t  s 

PFF 

had 

mult 

iple 

1  i  f  e  t 

s 

tr  in 

gr  i 

t  was 

not 

tr  ue  t 

s 

arae 

co 

rresponde 

nee  w 

P 

olic 

ies. 

Fo  r 

P7, 

t 

he  DW 

m 

emo  r 

y  s 

ize  o 

f  bo 

th 

DWS 

h 

ad  t 

he  1 

arges 

t  me 

mo 

r y  si 

a 

nd 

PFF 

pol  ic 

ie  s 

di 

d  not 

w 

ay. 

Hor 

eover 

,  PF 

F 

had  R 

w 

hich 

DW 

S  di 

d  ( 

P3 

and 

r 

efer 

ence 

St  r  i 

ng  d 

if 

f  er  en 

k nees. 


F  threshold  acts  as  a  lower  bound 


of  the  tables  yields  other  knee- 
e  was  no  evidence  indicating  that 
in  some  definite  order  depending  on 
the  smallest  resident  set  size  knee 
y  knee) ,  This  observation  supports 
ve  captured  a  wide  range  of  program 
erence  strings,  and  that  nested 
tengths  exist.  Also,  where  DWS  and 
ime  knees  for  the  same  reference 
hat  the  level  of  the  knees  had  the 
ith  resident  set  size  for  both 
S  primary  knee  had  the  smaller 
knees,  whereas  the  PFF  primary  knee 
ze  of  all  PFF  knees.  Thus,  the  DWS 
report  phase  behaviour  in  the  same 
o  secondary  knee  for  two  strings  on 
P5)  ;  PFF  had  a  tertiary  knee  on  a 
t  from  those  where  DWS  had  tertiary 


3.3.2 


Dominance  of  DWS 


Knees 


T 

he 

list 

of  basic 

pro 

per 

ties 

of  t h 

is  c 

hapte 

r  allows 

u 

s 

t 

o 

e 

produce 

a 

lifet ime 

V 

a  lu 

e 

nonlo 

okah 

ead 

policies 

i 

n 

t 

he 

espec 

ially  pr 

imary  k 

nee 

s 

• 

D 

e  n 

domin 

a  tes 

in 

space- t 

ime 

(a 

un 

i  ncre 

ases 

pag 

ing  more 

th 

a 

n  a 

u 

ni 

pagi  n 

q)  under 

the  assu 

m  pt 

i 

ons 

o 

f 

[  n7,D 

18  ]. 

No 

policy 

ca 

n 

si 

mu 

It 

and  ffl 

ean 

res  id 

e n t  set 

si  z 

e 

over 

t 

presented  at  the  beginning 
xpect  the  DWS  policy  to 
exceeding  that  of  other 


vicinity 

of 

DWS 

knees. 

ni  ng 

an 

d 

Te 

show 

that  DWS 

it 

r  edu 

ct 

ion 

in 

memory 

t  increa 

se 

in 

space 

reduces 

the  phase/transition  model 
aneously  reduce  both  paging 
he  dominant  policy. 


This  statement 
However,  when  signif 
among  phases,  even  fi 
can  be  dominated  by  D 
to  generate  paging 
exceed  the  OPT  reside 
adjusted  to  track  th 
paging  within  phases, 
set  is  deficient  rela 
OPT  can  be  dominated 
strings  are  given  in 


does  not  apply  to  lookahead  policies, 
icant  locality  set  variations  occur 
xed  space  lookahead  policies  (like  OPT) 
WS.  The  reason  is  that  OPT  is  forced 
within  those  phases  whose  locality  sets 
nt  set  size;  in  contrast,  DWS  can  be 
e  locality  process  well  and  generate  no 
If  the  phases  in  which  OPT ' s  resident 
tive  to  the  locality  set  are  long,  even 
by  the  DWS  policy.  Simple  reference 
[DiO]  showing  cases  of  OPT  deficiency. 


The  experimental  data  confirms  the  dominance  of  DWS 
knees  (Table  3, 3. 2.1).  The  DWS  lifetime  value  at  its 
primary  knee  was  never  exceeded  on  the  eight  reference 
strings  by  the  PFF  and  LRU  nonlookahead  policies;  on  two  of 
the  strings,  the  DWS  lifetime  knee  value  was  the  same  as  the 


no 


Itef- 

Sir. 

Pws 

Dominant 

CVH’RWJ- 

LOOKA^i 

HEMOAy 

RMOe, 

WINDOW 

RAMOt 

DW$ 

uPeriHf 
Ar  Dwi 

ICMIfg' 

orr 

uFrriHt 
At  DW5 

Kwee 

Puis 

POMlNflHT 

oyeft 

o?r 

pi 

=  FFF 

- 

6^300 

6,fS'0 

N'O 

?3l 

yes 

{51,300;  SC!0,OPp) 

XS)ooo 

14,300 

yes 

P3 

yfS 

(10;  111)^ 
femiii  :  jy,700) 

3,oyo 

3^aoo 

NO 

p£| 

ye-s 

(iijn), 

(jiMlI;  U?(») 

3,7  (>0 

IjlOO 

Y^s 

PT 

Yes 

(ar^e) 

^700 

s,sio 

.yes 

Y^S 

(I0;34l 
(^oll ;  l%,SCic) 

s,aoo 

yes 

n 

YES 

(fOilOo). 

3^Yao 

4,340 

j  ?J0 

t 

PS 

=PFF 

20JTO 

17,ST0 

YES 

(JVA«II  3  <10^000  ;  (arje  =  7^00, 00ft) 


Table  3.3.2. 1  Dominance  of  DWS  lifetime  value  at  its  knee 


Ill 


P FF  lifetime  value.  The  memory  size  ranges  for  DWS  lifetime 
knee  f^lominance  are  also  given  in  Table  3.  3. 2.1;  while  the 
ranges  and  associated  window  sizes  vary  from  program  to 
[)roqram,  it  is  aenerally  true  that  they  are  both  quite 
large . 

The  DWS  lifetime  knee  value  exceeds  the  OPT  lifetime  at 
the  DWS  primary  knee  size  in  five  of  the  eight  reference 
strings  {Table  3.3.2. 1),  confirming  the  existence  in  actual 
reference  strings  of  OPT  resident  set  deficiency. 

Another  feature  of  dominance  concerns  the  lifetime 
primary  knee  slope  g(x1)  ^  L:r(x1)/x1.  We  observed  that  the 

DWS  lifetime  knee  slope  always  dominated  the  knee  slopes  of 
the  PFF  and  LRU  nonlookahead  policies  on  the  eight  reference 
strings  {Table  3. 3. 2. 2).  On  six  of  the  strings,  the  DWS  and 
PFF  lifetime  knee  slopes  were  the  same;  on  the  other  two, 
the  DWS  knee  slope  was  clearly  larger.  Where  the  DWS  and 
PFF  lifetime  knee  slopes  were  the  same,  PFF  never  had  a 
smaller  mean  memory  allocation  at  its  knee.  (The  gap 
behaviour  of  PFF  made  the  accurate  interpretation  of  this 
remark  difficult  for  reference  string  P5.)  We  also  observed 
that  the  DWS  lifetime  knee  slope  dominated  the  OPT  lifetime 
knee  slope  in  three  of  the  eight  reference  strings.  The 
function  g(*)  is  important  in  space-time  performance  measure 
considerations  and  will  be  discussed  further  in  Section 
3 .4.  1  . 


3.3,3  Irremovable  Overshoot 

While  the  phase  behaviour  covers  the  majority  of  virtual 
time  [d1],  the  transition  behaviour  usually  produces  the 
majority  of  page  faults.  The  reaction  of  the  VMIN  and  DWS 
policies  to  a  transition  is  suggested  in  Figure  3, 3. 3.1. 
Assume  that  the  window  size  T  is  larger  than  D,  the  time 
interval  during  which  every  locality  set  page  is  referenced. 
The  resident  sets  are  then  identical  beginning  T  units  after 
a  phase  begins  and  until  D  units  before  a  phase  ends,  and 
paging  results  solely  from  new  locality  pages  entering  at 
the  transition.  Because  the  page  fault  sequences  for  VMIN 
and  DWS  for  the  same  window  size  are  identical  [D16],  the 
operating  points  have  the  same  LT(«)  [=1/m(T)  ]  value  but 
different  mean  resident  set  sizes,  the  difference  arising 
from  V.IIN's  ability  to  anticipate  a  transition.  In  other 
words,  the  quantity  s(T)-v(T)  estimates  the  average  area 
between  the  curves  in  Figure  3.3. 3.1. 

Denote  as  an  overshoot  the  excess  resident  set  arising 
when  a  nonlookahead  policy  fails  to  anticipate  that  the  next 
reference  to  the  page  just  referenced  occurs  more  than  T 
units  in  the  future.  Because  an  overshoot  occurs  only  at 
the  beginning  of  ar  interreference  interval  exceeding  T,  and 
because  DWS  keeps  such  a  page  resident  for  T-1  time  units 


Rtf. 

Slf. 

Dws 

acF£  DoHffJWff 
OVl-R.  /JPM'  ' 
LoOi'AHg'fVD  i 

Dvos 

su3?e  pofiitJAfii 

cys'R  o?r 

DWS  «irt/)fcy!oWSTWMftM 

KWET  jKugr  SU3?e 

:  cooxt>;«ffitsi(RirFS/?;^) 

1  i 

?i 

»?FP 

No 

50  6^300 

ia&  j 

?3i 

s  PFF 

:a?r 

57  15,050 

U3  1 

P5 

yps  1  ?yt) 

15  3,040 

G5  i 

■pij. 

=  PFF 

^i0 

35  3,7GO 

107 

Pr 

.PFF 

NO 

4a  5<loo 

17  a, 340 
' 

136 

PG 

yr5 

yes 

ao  zno 

37a 

P7 

*PPF 

No 

45  3,4i0 

HI 

P? 

.PFF 

■BKiweaK  'iMMb'  jffprya omuana. 

34  Jo.  410 

5-3?  i 

ewr»«w<T 

Table  5. 3. 2. 2  Dominance  of  DWS  lifetime  knee  slope 


113 


Resident 


Virtual 

tiyv\e 


Figure  3.3.3. 1  Reaction  of  and  DIVS  to  a  transition 

at  time  t 


114 


longer  than  VFiIN,  the  contribution  to  an  overshoot  is 
averaging  (T-1)»in(T)  fau'ount  •  probability].  Thus,  we  have 
s  (T)  - V  (T)  =  (T- 1 ) -ni  (T)  =  (T-1  ) /LT  (s  (T)  )  ,  the  eguaiity  being 
approximate  because  the  reference  string  is  finite  (see 
Figure  3. 3. 1.2). 

It  is  possible  to  invent  modified  DWS  (  IDWS)  policies 
that  clip  the  overshoot;  Smith  has  such  a  study  [S14].  The 
idea  is  to  detect  when  a  transition  is  in  progress  and 
deallocate  pages  (of  the  old  locality  set)  not  referenced 
since  the  start  of  the  transition.  Criteria  for  detecting 
transitions  include: 

1.  a  run  of  interfault  intervals  shorter  than  the  recent 
average,  or  a  run  of  LHU  stack  distances  longer  than 
the  recent  average,  or 

2.  the  use  of  two  windows,  T'<T:  if  the  eldest  IRS  page 
in  the  DwS  resident  set  has  age  not  exceeding  T',  no 
transition  is  in  progress  (see  [S14]  for  further 
da  ta ils) . 

In  general,  there  is  some  delay  d  required  to  observe  that  a 
transition  is  in  progress,  so  that  a  modified  DWS  scheme 
will  tend  to  produce  a  resident  set  process  as  in  Figure 
3. 3. 3. 3.  Under  the  assumptions  of  the  phase/transition 
model,  the  KDWS  policy  will  have  the  same  page  fault 
sequence  as  DWS  and  VMIN.  Results  in  [SI  4]  show  that  MOWS 
typically  achieves  a  5  percent  resident  set  size  reduction 
from  DWS.  This  suggests  that  some  clipping  of  overshoot  is 
possible,  although  the  overhead  for  the  modified  scheme  may 
not  be  justifiable.  Note  that,  even  if  d  ->  0,  there  is 
still  some  overshoot  because  ViilN  anticipates  coming 
transitions;  this  is  irremovable  overshoot. 

The  phase/transition  model  predicts  that  PFF  overshoots 
more  than  DWS,  because  PFF  has  no  upper  limit,  such  as  T,  on 
how  far  into  a  phase  it  progresses  before  removing  old 
locality  pages.  Note  also  that,  when  PFF  detects  a 
transition  in  progress  (interfault  intervals  shorter  than 
THRESH),  it  does  not  attempt  to  clip  the  overshoot,  but 
instead  allocates  more  pages.  An  instance  of  DWS  and  PFF 
overshoot  is  given  in  Figure  3. 3.3.4.  Two  types  of  PFF 
overshoot  can  be  measured,  one  at  the  DWS  knee  lifetime 
value  (PFF*)  and  the  other  at  the  PFF  knee.  The  former  type 
is  based  on  the  idea  that  the  DWS  knee  is  a  minimally 
sufficient  estimator  of  locality  in  the  phase/transition 
model.  The  latter  type  is  based  on  the  idea  that  the  PFF 
primary  knee  is  a  suitable  operating  point  for  PFF,  Table 
3,3.3. 1  presents  data  for  the  latter  type,  because  the  DWS 
lifetime  knee  dominance  shown  in  Section  3.3,1  implies  that 
the  PFF  overshoot  at  the  DWS  knee  will  necessarily  never  be 
smaller  than  the  DWS  overshoot  at  that  point. 


115 


Li'Petiftie 


RCil4«nt 

S/je  s(tJ 


Figure  3. 3. 3. 2 


Difference  in  resident  set  sizes  for 
\T^11N  and  DWS 


116 


Virtuol 

ti  Aie 


Figure  3. 3.3. 3  Reaction  of  MOWS  to  a  transition  at  time  t 


Figure  3. 3. 3.4  Overshoots  of  DWS  and  PFF  at  their  knees 


118 


Ref. 

Sfr. 

DvJS 

Oyers^itiot 

fPF 

Ov/ersKoot 

IMS 

Ove^skoot 
les5  < 

T>vore  > 
e<?;ual  = 

PI 

3.0 

ao 

n 

11 

/a 

< 

P3 

ih 

as" 

< 

PH 

.... 

9 

I3i 

< 

P5- 

IS 

s 

10 

? 

• 

P4 

q 

n 

< 

?7 

3S- 

H-3. 

< 

P? 

1  6  1  1  =  ! 

Xable  3. 3.3.1  DWS  and  PFF  overshoot  data 


119 


On  the  eiqht  reference  strings,  DWS  and  PFF  had 
comparable  overshoot  on  two  strings,  DWS  had  a  lower 
overshoot  on  five  strings,  and  no  result,  can  be  determined 
for  one  string,  P5.  (The  r^^ference  string  P5  is  in  doubt 
for  two  reasons.  One  is  that  DWS  has  two  primary  knees  of 
overshoots  8  and  IB.  The  other  is  the  gap  behaviour  of  PFF 
on  P5,  making  the  accurate  determination  of  the  PFF  knee 
difficult.)  Table  3. 3. 3.1  provides  evidence  that  DWS 
produces  less  overshoot  in  estimating  locality  sets  than  PFF 
does  at  appropriate  operating  points. 

3.3.4  Summary 

The  phase/transition  model  of  program  behaviour  accounts 
for  the  occurrence  of  multiple  knees  in  the  lifetime 
function  of  a  memory  policy  which  tracks  locality  well.  We 
have  empirically  observed  the  multiple  knee  phenomenon  in 
general;  it  is  exhibited  to  a  greater  degree  by  the  DWS 
policy  than  by  the  PFF  policy.  Although  the  model 
associates  knees  with  disjoint  or  nested  phases,  the 
experiments  show  that  nesting  is  the  more  likely 
e  xpla  nation. 

We  have  outlined  an  ari^ument  that  implies  the  dominance 
of  the  DWS  knee  lifetime  value  over  the  lifetime  values  of 
other  nonlookahead  policies  in  the  vicinity  of  the  DWS  knee. 
This  property  has  bop-n  verified.  Moreover,  the 
phase/transition  model  suggests  that,  when  there  are 
significant  localitv  set  size  variations,  DWS  may  dominate 
the  OPT  lifetime  value  near  the  DWS  knee.  This  property  has 
also  been  verified. 

We  found  that  DWS  produced  less  overshoot  (and  hence 
less  locality  set  estimation  error)  than  PFF  at  their 
respective  knees. 

These  results  are  all  compatible  witli  the  general  view 
that  the  phase/transition  model  of  program  behaviour  can  be 
used  to  predict  empirical  phenomena.  Our  purpose  has  not 
been  to  demonstrate  that  this  model  is  the  ultimate  model  of 
program  behaviour,  but  to  show  that  it  can  explain  several 
important  empirical  observations.  Future  models  may  be 
better,  but  at  present  the  weight  of  experimental  evidence 
supports  the  phase/transition  model  best. 


MOM  about  performance  measures 


Because  system  throughput  is  maximized  when  memory 
space-time  is  minimized,  and  because  the  lifetime  primary 
knee  minimizes  the  contribution  to  space-time  due  to  paging, 
we  are  led  to  investigate  the  relations  between  knees  and 


120 


memory  space-time  minima,  and  between  memory  space-time 
minima  and  system  throughput.  The  conclusion  is  that  DWS 
primary  knees  correlate  strongly  with  primary  space-time 
minima,  and  with  maximal  system  throughput  when  mean  paging 
I/O  service  time  is  not  too  large. 

These  relations  are  examined  in  Section  .1,4,1  in  terms 
of  the  reference  string  trace  data.  They  are  further 
discussed  in  Sections  3.4.2  and  3.4,3  in  terms  of  a  queueing 
network  model. 


3,4.1  The  Knee  Criterion 

Consider  a  system  having  main  memory  size  ;  observe  the 
system  operation  for  Z  time  units,  during  which  J  jobs 
compl^^te  execution.  The  observed  throughput,  TP,  of  the 
system  is  then  J/Z.  The  average  space-time  per  job,  ST*,  is 
(ri»Z)/J  =  M/(J/Z)  =  l/TP.  The  latter  equation  indicates 
that  average  space-time  per  job  is  minimized  exactly  when 
system  throughput  is  maximized.  (This  argument  is  similar 
to  one  used  by  Buzen  [620].)  This  relation  allows  us  to 
study  the  minima  of  space-time  curves  in  order  to  locate 
throughput  maxima. 

As  argued  earlier,  an  approximate  space-time  formula  is: 

ST*(x')  ^  A1«L«X'*(1  +  A*F(x*)) 

=  Al«L«x’  +  A2*L» _ x^ _ 

LT  (X  • ) 


where  L  is  the  reference  string  length,  A1  is  the  main 
memory  access  speed,  A2  is  the  auxiliary  memory  access 
speed,  A=A2/A1,  F  is  the  page  fault  rate  function,  and  x*  is 
the  mean  memory  allocation  of  a  variable  partition  memory 
policy.  Because  choosing  x'  to  be  the  primary  knee 
maximizes  LT(x')/x’,  operating  at  this  knee  minimizes  the 
contribution  to  space-time  due  to  paging.  However,  it  may 
not  minimize  the  space-time  itself,  owing  to  the  terra  in  x' 
itself  and  to  the  formula's  being  an  approximation  of  true 
space-time.  Nonetheless,  this  argument  suggests  that  an 
experimental  study  of  the  relation  between  knees  and  space¬ 
time  minima  would  be  fruitful. 

Because  the  ST*  formula  is  an  approximation,  it  was 
decided  to  measure  the  true  ST  directly  and  check  the 
correspondence  with  lifetime  knees.  The  "true"  ST,  for  a 
mean  memory  allocation  of  x  pages,  is  measured  according  to 
the  formula 

ST  =  L«Al*x  A2*  (t  (i)  )  , 

i 


121 


where  the  sumniation  is  over  all  fault  times  t  ( i)  ,  ST  is  not 
the  actual  space-time  arising  from  the  quantity  M/TP, 
because  it  does  not  account  for  non-swapping  delays  in  the 
system  (e.g.,  I/O  and  various  queueing).  Thus,  a  further 
study  of  the  relation  between  space-time  and  throughput  will 
be  presented  (the  queueing  network  in  the  next  subsection). 


The  results 
The  nth  level  kne 
the  function  LT 
the  nth  smallest 
a  certain  level 
the  same  level  in 
in  a  table.  The 
the  lifetime  func 
produce  space-ti 
nth  level  local  m 
DWS  has  the  lo 
between  the  knee 
local  minimum  s 
maximum  relative 
last.  The  data 
relative  percenta 
knee. 


are  summarized  in  Tables  3 . 4 . 1  . 1- 3 . 4 . 1 , 3 . 
e  (n=1,2,3)  is  the  nth  highest  maximum  in 
(X) /x.  The  nth  level  space-time  minimum  is 
minimum.  Only  where  a  policy  has  a  knee  of 
in  its  lifetime  curve  and  a  local  minima  of 
its  space-time  curve  is  an  entry  recorded 
tables  indicate  that  the  nth  level  knees  of 
tions  of  the  DWS,  PFF,  and  LRU  policies 
me  values  which  correspond  closely  to  the 
inima  of  the  associated  space-time  curves, 
west  average  relative  percentage  difference 
space-time  value  and  the  corresponding 
pace-time  value;  it  also  has  the  lowest 
percentage  difference.  PFF  ranks  next,  LRU 
also  shows  a  tendency  for  the  average 
ge  difference  to  grow  with  the  level  of  the 


The  tables  show  that  the  correspondence  between  knees 
and  local  space-time  minima  is  extremely  strong  (.43  percent 
relative  error  on  average)  for  the  DWS  policy.  This 
correspondence  is  also  present,  though  not  as  strongly,  for 
the  PFF  and  LRU  policies  (2.43  and  3.40  percent  relative 
error  on  average,  respectively) .  This  evidence  strongly 
supports  the  plausibility  argument  relating  the  knees  of  the 
DWS  lifetime  curve  to  the  local  space-time  minima. 


The  data  for  PFF  show 
average  between  lifetime  knee 
However,  PFF  shows  a  high  re 
reference  strong  P6.  On  P6, 
to  its  secondary  local  space 
knee  corresponds  to  its  glob 
error  produced  a  relative 
cases  of  over  10  percent, 
corresponds  to  its  third  1 
third  level  knee  correspon 
minimum.  However,  for  DWS 
relative  percentage  differenc 
percent.  Although  the  DWS 
correspond  to  its  global  spa 
nearly  the  same  space-time 
always  true  for  PFF. 


a  good  correspondence  on  the 
s  and  local  space-time  niinima. 
lative  percentage  difference  on 
PFF's  primary  knee  corresponds 
-time  minimum  and  its  secondary 
al  space-time  minimum.  This 
percentage  difference  in  both 
On  PB,  DWS*s  primary  knee 
evel  space-time  minimum  and  its 
ds  to  its  global  space-time 
this  error  only  produced  a 
e  in  both  cases  of  about  1.5 
primary  knee  may  not  always 
ce-time  minimum,  it  produces 
as  the  minimum,  a  statement  not 


The  data  for  LRU  shows  the  weakest  correspondence.  Both 
the  average  and  maximum  relative  percentage  difference  for 


122 


Str. 

level 

levti 

3"»  level 

LT 

l^nee 

ST 

Mia  ST 

+  M«a 
ST 

?€l. 

Piff. 

LT 

lC'^ee 

ST 

Hiast 

4  MiA 
ST 

?e|. 

Fere. 

DifF 

LT 
Knee 
<4  K7\e€ 
ST 

H,a  ST 
Knee 

4-  HiA 
ST 

Kel* 

Fere. 

FI 

So 

1.“!? 

SO 

1.92 

0 

3L 

a.ib 

a? 

i-)+ 

.^3 

- 

- 

- 

n 

SI 

IA\ 

SI 

Ml 

0 

- 

- 

- 

- 

- 

«« 

P3 

is 

3.a 

3.a 

0 

3.a3 

3.aa 

.31 

- 

m 

- 

ptf 

3S' 

i-ht 

3H 

j. 

0 

l.^Z 

V<i 

1.02. 

0 

- 

- 

?s 

\.n 

n 

ifH 

/.I2 

ao 

i.?3 

0 

l.w 

- 

- 

2r 

2.0J 

2.07 

?t> 

^0 

3l0 

0 

la 

1-04 

10 

1.03 

.^7 

- 

- 

P7 

^s* 

QO 

H.lh 

0 

Oi 

i3a 

0 

- 

- 

- 

3<> 

•  M 

31 

•6? 

(.‘tl 

JH 

.6?H 

34 

.624 

0 

31 

.62 

31 

/.‘fr 

Per  cent 
Diffe^eAce : 

1 

!  ' 

■ 

McaK  j  .srj 

.37 

MaxiPAUftv  jib^l 

.^1 

l.'fS" 

Table  3.4. 1.1  Relationship  of  nth  level  lifetime  knees  and 
space-time  local  minima  for  DWS 


123 


Ref. 

level 

3^^  leve^ 

LT 

^^■ne€ 

+  tnei 
sr 

M,^sr 
knee 
^  Mir\ 
ST 

Rel. 

Pferc. 

DifF. 

LT 

Kt\€€ 

•GKne^ 

sr 

Mia  sr 
Kaic 
i,  Min 
ST 

Perc. 

Dlff 

LT 

K.TVt« 

ST 

?1i»\  ST 

Knee 

Knee 

sr 

Rel. 

Pe<e.. 

Oiff. 

PI 

50 

lAlc 

51 

0 

3J 

a.  07 

30 

/.qs* 

C./S' 

- 

- 

- 

Pi 

1,0 

1-13. 

ho 

/./a 

0 

- 

- 

- 

- 

- 

- 

?3 

?.13 

58 

3.!o 

.V 

- 

- 

- 

- 

«• 

- 

q-o 

l.% 

/•'?4 

1.03 

Sfe 

J-O? 

55 

a-DG 

.‘IT 

- 

- 

pr 

1.1! 

!  •  Gq 

|.!« 

«• 

- 

- 

- 

pc 

3“^ 

I.D7 

.qG 

11.44 

!<? 

34 

I  J.OT 

io.a? 

- 

- 

P7 

/OS' 

3.n 

lo^ 

3-?? 

i.oi 

87 

q.ai 

J7 

H.ai 

0 

q.5o 

GG 

C{..H4 

.90 

rs 

3i 

1  3^ 

0 

3G 

.GU 

36 

.64  J. 

0 

- 

?erct^t<kQ^ 

. 

j  i.% 

- -  .  A  _ 

f 

1 

b.« 

.90  1 

_ 

ii'o.ag 

.90! 

Table  3. 4. 1.2  Relationship  of  nth  level  lifetime  knees  and 
space-time  local  minima  for  PFF 


124 


kfp. 

Sir- 

U*- 

3'’'’  level 

LT 

+  fCTKCC 
ST 

Mm  ST 
Kn^e 

1.  hiA 

sr 

Rel- 

Pe^c- 

■Diff. 

LT 

i-KAte 

ST 

Mm  ST 

V  Mm 
ST 

Rel- 

Perc- 

PiFf. 

LT 

^KAtt 

ST 

Mm  ST 
Knee 
V  Mm 
ST 

Rel 

Pe^c. 

Pifk 

PI, 

1.'^? 

HO 

£.U 

- 

- 

- 

- 

- 

VA 

S2 

1-30 

JO 

/•S') 

•7S 

- 

- 

- 

« 

- 

- 

P3 

?0 

).3fe 

10 

2.i3 

1 

- 

- 

- 

?f 

60 

a.3t 

SI 

U.a3 

G.3i 

- 

- 

« 

- 

is" 

/•CO 

i 

1  !-fcO 

0 

- 

- 

a* 

. 

PC. 

30 

i-os' 

n.io 

- 

- 

— 

S7 

ir 

‘t-Tl 

1  ■  tc^ 

[ 

- 

- 

PS- 

3S 

.79 

.79 

0 

- 

- 

- 

— 

- 

Te^cei^■taje 

t 

! 

\ 

> 

Hea^  1  j.‘n) 

{ 

- 

f 

“ 

HcvxitvmJ^  jil-lOl 

1 

1 

. . J 

Table  3.4. 1.3  Relationship  of  nth  level  lifetime  knees  and 

space-time  local  minima  for  LRU 


125 


the 

pr  i 

mary 

knee 

w  er 

e 

t  he 

PFF) 

.  N 

o  se 

conda  r y 

kn 

ee 

s  o 

m  ini 

ma 

were 

observed 

f 

or 

a  Iso 

has 

the 

t  rouble 

som 

G 

feat 

is 

not 

sira 

pie.  A 

n 

LE 

U  s 

shar 

p  mi 

nimu 

m  (Appen 

dix 

D 

and 

to 

fin 

d 

this  o 

perat 

ing 

comp 

uting  th 

e  space- 

t  im 

e 

form 

appear 

to 

be  contr 

oil 

ab 

le  i 

be  f 

ound 

by 

indirect 

me 

thods 

high 

es  t 

for 

LRU  (but 

s 

im 

ilar 

to 

r  s 

econ 

dary 

local 

s 

pa 

ce- 1 

ime 

LRU 

on  a 

ny  o 

f  the  st 

ri 

ng 

s. 

LRU 

ure 

that 

locating  t 

he 

mini 

mum 

pace 

-  tim 

e  cu 

rve  typica 

11 

y  ha 

s  a 

[  C2 

])  . 

It 

is  very 

di 

f  f  ic 

ult 

poi 

nt 

syst 

ematical 

ly 

with 

out 

ula . 

DW 

S  an 

d  PFF,  i 

n 

CO 

ntra 

St, 

n  that  the  knees  and  minima  can 
(Section  3.5). 


Table  3,4. 1.4  compares  the  policies  in  terms  of  their 
global  space-time  minima.  Nor  surprisingly,  OPT  achieved 
the  lowest  space-time  on  5  of  the  8  reference  strings, 
whereas  DWS  achieved  the  lowest  space-time  on  the  other  3. 
Among  the  nonlookahead  policies,  DWS  was  best  for  3  strings 
and  within  9  percent  of  the  best  on  7  strings.  The 
superiority  of  LBU  on  string  P5  may  be  related  to  the 
reference  structure  generating  PFF  erratic  behaviour. 

The  major  results  of  this  section  can  be  summarized  as 
follows.  It  was  proved  that  maximum  system  throughput  is 
achieved  by  minimizing  average  memory  space-time  per  job. 
It  was  argued  that  minimum  space-time  per  job  was  related  to 
operating  that  job  at  the  knee  of  its  lifetime  curve. 
Experiments  on  the  reference  strings  support  strongly  the 
claim  that  the  primary  knee  ot  a  memory  policy's  lifetime 
curve  corresponds  to  the  resident  set  size  of  its  global 
space-time  minima.  This  claim  is  most  supportable  for  DWS 
(DWS  knees  were  space-time  minimal  for  three  reference 
strirongs  and  within  9  percent  for  four  others) ,  least  for 
LRU.  PFF  exhibited  erratic  behaviour  in  one  reference 
string;  DWS  exhibited  no  such  behaviour. 


These  experiments  do  not  test  the  hypothesis  that  DWS 
performs  at  or  near  the  best  among  all  nonlookahead 
policies,  A  simple  argument  in  terras  of  the 

phase/transition  model  suggests  that  no  significant 
deviations  would  be  observed  if  other  nonlookahead  policies 
were  tested.  The  space-time  formula  ST  for  memory 
constraint  x  can  be  compactly  expressed  in  the  form 

ST  (X)  =  Bl«x  +  B2*h  (X)  , 

where  B1  and  B2  are  suitable  constants  (independent  of 
memory  policy),  and  h (x)  is  the  ratio  x/LT  (x) .  Letting 
B=B1/B2,  this  can  be  rewritten  and  normalized  to 

ST(x)  -  B«x  +  h(x) 

Suppose  the  pair  (x,h),  where  h=h(x),  is  generated  by  a  DWS 
policy  at  its  primary  knee.  (This  may  be  possibly  an 


126 


5^  u-  5 

iai  o<C^=^ 

•  ‘iff* 

ti't  df 

^  o  3 

1 

<v^ 

1 

to 

cr 

I 

< 

i  1  ^ 

•  ^  -j 
^  g  (2 

o 
tsl 
— i 

'rO 

U. 

U- 

Q- 

VO 

3 

0 

0 

Ct^ 

-J 

0 

Cni 

o. 

tl- 

a. 

vO 

3 

P 

i  >" 

1  - 
^  O 

31  ^ 

V- 

ft- 

O 

v> 

3 

0 

\- 

to 

3 

P 

1- 

0 

V- 

V- 

fe 

3 

P 

O' 

lA 

■M 

o 

u. 

"> 

e 

c 

? 

•v 

0/ 

1 

v*» 

U- 

LL 

o 

0 

vJ3 

Oo 

ro 

o5 

o~ 

0 

cr 

L/^ 

or 

05 

0 

i — % 

or 

CJ- 

N>P 

* 

m 

Co 

ro 

« 

UO 

3 

P 

C) 

r- 

L/^ 

zi- 

3" 

CO 

3- 

0 

n 

0 

cr- 

Co 

o- 

« 

• 

0 

r-^ 

CO 

00 

*-p 

• 

os 

c>» 

« 

g: 

• 

r 

"T 

• 

o 

o 

Vo 

OS 

s 

r- 

ro 

r* 

cxa 

• 

0 

ro 

r~i 

r-i 

0 

cr- 

• 

i>o 

3" 

o— 

r- 

« 

fc 

o 

60 

C7- 

ro 

\o 

Vo 

v_? 

Vo 

*3" 

37 

rt 

r6 

ri 

00 

<~t 

OS 

• 

00 

ro 

r: 

c: 

<y  -vJ 

51 

s 

SI 

3- 

Lo 

s: 

00 

Ol. 

Table  3.4. 1.4  Space-time  global  minima  behaviour 


127 


overshoot-clipping  modified  DWS.)  Let  the  pair  (x',h*), 
where  h'=h'(3c’)r  he  generated  by  another  nonlookahead 
policy.  We  want  to  show  that 

ST(x)  <  ST(x’)  ,  or  eguiva  lent  ly , 

B*  (x-x*  )  <  h  '  -  h 

Consider  Figure  3.4. 1.1.  If  the  final  inequality  above 
holds,  diagram  (a)  roust  hold  when  x’  <  x.  Case  (a)  will 
also  certainly  satisfy  the  inequality  when  x*  >  x.  However, 
when  x'  >  x,  diagram  (b)  may  also  hold. 

Consider  case  (a).  If  there  exists  a  policy  whose 
operating  points  (e.g.,  u)  fall  above  the  DWS  knee  slope 
line,  we  have  a  violation  of  the  pri nci pie  of  space- time 
dominance  in  the  phase/transition  model;  this  principle, 
which  shows  that  the  lifetime  value  of  DWS  dominates  other 
nonlookahead  lifetimes  near  the  DWS  knee,  implies  that 
decreases  in  resident  set  size  have  a  more- than-linear 
penalty  in  lifetime.  Thus,  to  the  extent  that  this 

principle  holds  in  practice,  it  is  unlikely  that  another 
memory  policy  can  getierate  a  point  u  in  case  (a)  . 

The  only  possible  violation  of  the  inequality  for  x'  >  x 
is  when  h’  <  P*(x-x*)  +  h.  The  data  show  that  h(»)  is 

typically  a  small  number,  less  than  1.  The  ratio  B  is  in 

essence  the  reference  string  length  divided  by  the  memory 
access  time  ratio;  for  the  data,  this  is  approximately 
(  1.  3»  10^) /in'* ,  or  at  least  100.  Thus,  for  small  increments 
in  X*,  x-x'  will  be  negative  and  B«(x-x*)  +  h  will  be 

negative.  Because  h'  is  positive  (by  definition),  the 
typical  numbers  render  the  violation  impossible. 


3.4.2  Queueing  Network  Model  Description 


The  lifetime  and  space-time  curves  consider  a  program  in 
isolation;  they  are  measures  of  the  program's  memory  demand 
and  do  not  account  for  interactions  among  programs  -  e.g., 
queueing.  A  multiprogramming  experiment  can  provide  such 
insight.  Our  model  for  a  multiprogramming  experiment  is  a 
queueing  network  model.  It  is  a  simple,  cost-effective  way 
of  testing  relations  between  system  performance  measures 
(throughput,  response  time,  etc.)  and  program  measures. 


The  queueing 
programs  competing 
studies  strongly 
program  measures, 
not  multiclass), 
results  for  the 


networks  used  here  assume  one  class  of 
equally  for  resources.  The  queueing 
verify  the  value  of  policies  based  on 
Though  limited  in  scope  (e.g.,  they  are 
the  experiments  suggest  strongly  that  the 
policies  generalize.  General  network 


t 


128 


ResideAt 
fet  vjie 


Figure  3. 4. 1.1  Testing  the  dominance  of  the  DWS  knee  lifetime 


129 


studies  were  not.  undertaken  because  our  primary  interest  is 
in  program  behaviour,  A  more  comprehensive  network  study 
has  been  given  by  Kahn  [ K  1  ], 

Because  changes  in  load  are  much  less  frequent  than 
transitions  between  stations  in  the  network  by  active 
programs,  we  may  use  the  equilibrium  value  of  throughput  at 
load  n  as  a  good  approximation  to  the  true  throughput  [CS], 
Moreover,  queueing  networks  have  been  found  quite  reliable 
for  predicting  throughputs  fBl91. 

The  model  we  use  to  represent  a  mul tiprogr ammed  computer 
system  is  the  one  introduced  in  Section  2.1.1.  It  is  the 
same  as  used  by  Denning  in  a  similar  study  [ D6  1;  we  adopt 
Buv.en's  notation,  as  indicated  in  Figure  3.4.2.  1  [BIS]. 
Station  1  is  the  processor  station,  station  2  is  the  file 
I/O  station,  and  station  3  is  the  paging  I/O  station.  The 
q (i)  of  Figure  3.4.  2.1  represents  transition  probabilities, 
where  we  have  eliminated  the  passive  network  and  have  used  a 
simple  feedback  loop  at  the  processor  station.  We  have,  of 
course , 


q  (1)  +  q  (2)  +  q  (3)  =  1  . 

The  network  parameters  q{i)  are  derivable  from  the  program 
parameters.  Let  T(i)  denote  the  total  service  requirements 
at  station  i.  T(1)  and  T  (2 )  are  intrinsic  to  the  program; 
T  (3)  is  affected  by  the  memory  policy  used  at  the  processor 
station  and  can  vary  over  a  wide  range.  Let  a  (2)  denote  the 

rate  at  which  a  program  requests  file  I/O;  its  total  number 

of  file  I/O  requests  is  T(1)*a(2)  and  the  total  time  to 
service  them  is  T  (2)  =  {T  (1 )  •a  (2)  ) /mu  (2)  ,  mu  (i)  denotes  the 
service  rate  of  station  i;  it  is  the  reciprocal  of  the  mean 
service  time  at  station  i.  Similarly,  if  a (3)  is  the 
program’s  paging  rate,  then  T  (3  )  =  (T  { 1 )  *3  ( 3)  ) /mu  (  3)  .  The 
mean  number  of  passes  through  the  processor  station  is 

Vq(1)  ,  because  the  program  chooses  independently  the 

feedback  loop  with  probability  q(1)  on  every  processor 
station  departure.  since  the  mean  time  per  pass  at  the 
processor  is  1/mu(1),  we  have 

T(  1)  = _ 1 _  (3.4.  2.  1) 

q (1)  •mu  (1) 


Of  the  1/q(1)  passes  a  program  makes  on  the  processor 
station,  1/q(1)  -  1  of  them  were  to  an  I/O  station  (the  last 
exited  on  the  feedback  loop) ;  this  is  equal  to  the  total 
number  of  I/O  requests,  T(1)*{a(2)  +  a(3)).  We  have 

q{i)  = _ 1 _ 

1  1(1)  •(a  (2)  +  a  (3)) 


(3.4. 2.2) 


130 


Figure  3.4. 2.1  Queueing  network  model 


131 


(1.4.  2.1)  and  (3. 4.2.  2)  together  imply  that 

mu(1)  =  a  (2)  +  a  (3)  +  1/T(1).  The  number  of  visits  a 

program  makes  to  the  file  I/O  station  is  T(1)*a{2),  which  is 
equal  to  q(2)/q(1).  This  implies 

q  (2)  =  T  (1 ) -a  {2)  ♦q  (1)  = _ 11111^12}. _ (3.4.  2.3) 

1  f  T(1)*(a(2)  +  a  (  3)) 


Simila  rly , 

q(3)  =  T  (1  ) -a  (3)  •q  (1)  = _ Iili2ai3}. _  (3.4. 2.4) 

1  +  T(1)  *(3(2)  +  a(3)) 


Note  that  the  sum  of  the  q (i)  is  1,  as  required. 

The  parameter  a (3)  is  of  particular  importance  to  us. 
It  is  the  rate  at  which  a  program  requests  paging  I/O 
service;  in  other  words,  it  is  the  program's  page  fault  rate 
and  is  certainly  affected  by  the  memory  policy  used.  A  high 
level  of  multiprogramming  implies  a  small  memory  allocation 
for  each  program  which  usually  implies  a  high  paging  rate 
a  large  value  of  a (3) . 

After  deriving  the  parameters  i(i)  and  knowing  the 
parameters  a(i)  and  mu(i) ,  we  are  in  a  position  to  use 
Buzen's  computational  techniques  to  solve  for  the  system 
throughput  for  various  levels  of  multiprogramming  and  for 
various  memory  policies  [317, B18].  Our  procedure  is:  for  a 
given  level  of  multiprogramming,  n,  determine  the  memory 
allocation  for  each  active  program,  assuming  an 
equiparti tion  (i.e.,  z(i)  =  IMj/n,  for  each  program  i;  this 
is  legitimate  because  we  have  one  class  of  programs). 
Determine  the  program's  page  fault  rate  for  this  memory 
allocation  under  the  particular  memory  policy  being 
considered:  this  is  the  value  of  a  (3) .  Evaluate  the 

queueing  network  to  find  the  system  throughput  for  the  given 
parameters.  Repeat  for  the  next  level  of  multiprogramming, 
n+1.  These  operations  are  repeated  for  each  memory  policy 
and  program  under  consideration. 

The  comparison  of  memory  policies  with  respect  to  system 
throughput  can  take  several  forms.  One  is  the  maximum 
system  throughput  and  the  corresponding  level  of 
multiprogramming  produced.  Another  measures  the  robustness 
of  the  policy  in  terras  of  the  width  of  the  plateaux  of  the 
throughput  curve.  We  report  two  plateaux,  corresponding  to 
load  ranges  that  operate  within  5  percent  or  10  percent  of 
maximum;  see  Figure  3.4.2. 2.  (In  general,  a  "p  plateaux" 
for  a  given  throughput  curve  is  defined  by  a  maximal  load 
range  (N1,N2)  such  that  a  load  n  in  (Nl,N2)  implies  that  the 


132 


) 


Figure  3.4. 2. 2  5  and  10  percent  plateaux  on  the 

throughput  curve 


133 


throughput  is  within  p  percent  of  the  maximum.)  For  the 
variable  partition  policies,  it  is  also  interesting  to 
investigate  how  easily  the  policy  parameter  can  be  set  in 
order  to  achieve  these  plateaux. 

Note  that,  when  the  closed  network  is  modified  by 
inserting  a  finite  source  of  N  independent  users,  each  of 
think  time  T,  at  point  A,  the  response  time  per  transition 
is  (N/TP(n))  -  T,  minimized  when  TP  (n)  is  maximum  [B20]. 


3,4,3  Queueing  Network  Model  Results 


For  these  experiments,  the  mean  file  I/O  service  time 
(1/rau(2))  was  set  to  25  msec,  the  total  CPU  time  needed  to 
1.299968  sec  (corresponding  to  1,299,968  references  of  1 
microsecond  each,  the  length  of  our  reference  strings),  the 
main  memory  access  time  (A1)  to  1  msec,  and  the  main  memory 
size  to  300  pages.  The  degree  of  multiprogramming  was 
varied  from  n= 1  to  n=15  jobs.  The  experiments  were  run 
twice  for  different  mean  paging  I/O  service  times  (1/mu  (3) 
or  S),  once  with  5  msec  and  again  with  10  msec.  For 
convenience,  we  arbitrarily  (but  not  unrealistically) 
assumed  that  the  parameter  a ( 2)  -  the  rate  at  which  the 

program  requested  file  I/O  service  -  was  one-tenth  of  the 
disk  service  rate  (i.e.,  a(2)  =  mu(2)/10). 

Whether  DWS  or  PFF  allowed  a  higher  load  at  suitable 
operating  points  was  investigated.  The  suitable  operating 
points  were  taken  to  be  the  lifetime  knee  resident  set  size 
and  the  maximum  throughput  resident  set  size.  Recall  that 
we  showed  in  Section  3.4,1  that  there  was  a  strong  empirical 
correspondence  between  lifetime  knees  and  global  space-time 
minima,  and  a  formal  correspondence  between  space-time 
minima  and  maximal  system  throughput.  The  data  is  displayed 
in  Table  3.4.3.  1.  DWS  generally  allowed  a  higher  load  at 
the  lifetime  knee;  the  superiority  of  PFF  on  string  P5  is 
questionable  because  of  its  erratic  behaviour  on  that 
string.  The  different  behaviour  of  DWS  and  PFF  for  the  two 
values  of  S  is  noticeable.  For  S=5,  DWS  generally  allowed  a 
higher  load  than  PFF  at  the  maximum  throughput  operating 
point.  For  S=10,  the  situation  was  reversed.  It  appears 
from  this  data  that  pvp  works  slightly  better  than  DWS  when 
S  is  large  compared  to  the  knee  lifetime. 

Note  that  the  response  time  W  was  given  in  Section  3,4.2 
as  (N/TP (n) )  -  T.  Denote  the  optimal  load  by  n®;  if 

nO(PFF)  <  nO(DWS)  but  TP  (nO  (PFF)  )  =  TP(nO(DWS)),  then 

W  (PFF)  =  W  (DWS) .  In  this  sense  (as  supported  by  Tables 
3.4. 3, 3  and  3. 4,3.4),  it  makes  no  difference  which  policy 
permits  higher  load.  Both  may  be  processing  jobs  at  the 
same  rate.  The  higher  load,  n^  (DWS)  ,  might  be  more 


134 


II 

s  =  10 

fef- 

Sir. 

PU5 

/oaA 

PFP 

loaJi 

tws 

load? 

Dus 

Max 

TP 

[W 

fFF 

HolX 

T? 

hoi 

Dw5 

(oli 

? 

» 

DWS 

Hax 

T? 

loa(( 

?FF 

Hooi 

T? 

Z)ws 

Inosr 

fcii 

7 

?l 

(, 

4 

r 

4 

4 

- 

S' 

G 

No 

Pa 

5 

S' 

S 

S 

2 

s 

S 

P3 

1 

r 

Ves 

4 

3 

Ves 

3 

No 

Vi 

i-t 

Yes 

S 

s 

S 

- 

PS 

1 

ia-i3 

No 

1 

f 

Vcs 

3 

a 

Yes 

pt 

IS 

q 

7es 

IS 

4 

Yes 

S 

No 

p? 

3 

3 

#  ■ 

3 

3 

•• 

1 

a 

p? 

fQ 

Ves 

1 

Table  3. 4. 3.1  DWS  and  PFF  loads  at  lifetime  knees 
and  maximum  throughputs 


135 


interesting  for  other  reasons  not  accounted  for  in  the  model 
(e.g.,  it  implies  better  memory  utilization  and  less  memory 
cost  per  job) . 

The  load  ranges  which  DWS  and  PPF  allowed  for  the  5 
percent  and  10  percent  plateaux  on  the  throughput  curves 
were  investigated  (see  Table  3, 4. 3. 2).  With  two  values  of 
5,  there  were  four  combinations  of  p  and  S  values;  DWS 
allowed  a  larger  load  range  in  three  of  the  four,  PFF  was 
marginally  better  for  S=10  on  the  5  percent  plateau.  This 
data  indicates  that  DWS  is  more  robust  than  PFF, 

The  relationship  between  the  lifetime  knee  and  the 
maximum  throughput  for  the  DWS,  PFF,  and  LPU  policies  was 
investigated.  The  data  is  displayed  in  Tables  3. 4.3,3- 
3, 4, 3, 5,  The  lifetime  knee  system  throughput,  the  maximum 
throughput,  the  5  percent  plateau,  and  the  10  percent 
plateau  are  shown.  The  effect  of  the  two  mean  paging  I/O 
service  times  is  very  striking  in  this  data.  Consider  the 
DWS  data: 

1)  For  S=10,  3  strings  had  lifatiroe  knee  system 

throughput  values  on  the  5  percent  plateay  and  4  strings 
were  on  the  10  percent  plateau. 

2)  For  S=5,  6  strings  were  on  both  the  5  percent  and  10 
percent  plateaux. 

Consider  the  PFF  data: 

1)  For  S-10,  5  strings  were  on  both  the  5  percent  and  10 
percent  plateaux. 

2)  For  S=5,  5  strings  were  on  the  5  percent  plateau  and 
6  strings  were  on  the  10  percent  plateau. 

Consider  the  IHD  data: 

1)  For  S=10,  1  string  was  on  the  5  percent  plateau  and  2 
strings  were  on  the  10  percent  plateau. 

2)  For  S=5,  4  strings  were  on  the  5  percent  plateau  and 
6  strings  were  on  the  10  percent  plateau. 

Thus,  the  correspondence  between  the  lifetime  knee  and 
maximum  throughput  operating  points  (the  knee  criterion)  for 
the  value  S=5  was  very  good  for  all  policies;  at  the  value 
S=10,  PFF  showed  a  better  correspondence  than  DWS,  with  LRU 
showing  a  poor  correspondence.  Again,  PFF  appears  to 
perform  slightly  better  than  DWS  for  large  values  of  S. 

This  data  showed  that  the  lifetime  knee  criterion  began 
to  break  down  when  the  mean  paging  I/O  service  time  became 


136 


a> 

u 

<y 

^2.i 

m 

cT? 

"tt- 

m 

o 

C— J 

3* 

m 

:r 

lUj 

CO 

a 

L^i 

o5 

po 

L— J 

r^ 

<3- 

•—f 

Cli 

m 

p'T' 

v.-» 

.ro* 

< — / 

OO 

Oo 

^vS 

o 

x-A 

a> 

O 

H 

V, 

3  «i  c 
o  o 
—  <- 

TT 

o 

< - 1 

TT 

1 — ! 
>J) 

CO 

rn 

d~- 

a 

r^ 

to 

s' 

n 

L~l 

oS 

L — J 

o~ 

rz? 

! 

rOs 

1 J 

Oo 

r* 

^  1 

If 

(V* 

o 

of 

L1.-CS  o- 
U-  o 

.zT 

<~5 

m 

-n,  ^ 

tXJ 

o4 

r-» 

C— J 

fT^ 

Uo 

L—J 

x-n 

Q 

d“ 

r-! 

5" 

r^ 

po 

1 } 

Vo 

rn 

Cj 

ro 

o 

o 

Lo 

’^’>T5  ^ 

ro 

r-^ 

o? 

< 1 

riS 

tp 

1  i 

r-# 

"H 

lo 

po 

cn 

C 

—4 

m 

tdi 

r*- 

uu 

<r?s 

s-> 

<» 

a> 

a--c  ^ 
u.  ci5 

fO 

Q 

s' 

In 

i—i 

o 

< — j 

d- 

rn 

U-1 

r^ 

r- 

si 

o 

UJ 

r- 

n 

U-; 

ro 

•s 

‘  Pv? 

< — > 

Oo 

r~» 

UJ 

O 

Lo 

ft 

VO 

#x 

O 

H 

o 

4- 

ro 

'-n 

r^ 

o 

t— J 

d- 

! — 8 
iyr> 

S 

'± 

r? 

, — * 
CteS 

? ) 

£2 

"S' 

s' 

f~-i 

po 

•> 

4— J 

cr 

p/' 

U-J 

U 

5  ■ 
L/i 

o> 

u 

<v 

A._2  c 

rg 

( — t 

d* 

m 

o> 

r-^ 
t — » 

rO 

r~7 

lo 

1  1 

rS 

l—n 

ST 

Lo 

f— J 
■~o 

u-J 

v4> 

r^ 

cr 

cr' 
t. — 1 

rTS 

/ — » 
PO 

UJ 

i — 1 

Oo 

u-1 

L/^ 

r- 

o> 

p. 

Ci> 

PO 

tJ- 

tv-> 

fv^ 

t-6 

OO 

oo 

to 

•o  pil 

3*31’ 

“it 

'3 

i£f 

rn 

1 — t 

1 — > 
C/» 

1 — t 

m 

s' 

r— » 

&o 

i — 1 

m 

to 

s' 

m 

CO 

r>r 

L.-J 

Oo 

c 

ta 

• 

-u 

rV 

<=^ 

3“ 

Lo 

A- 

Ci- 

r^ 

Q- 

to 

A- 

cy 

<2 

»W 

4- 


'I 

hS 

X 

tdt 

.o 

o 

'V 

<5* 

III 

rw 


>o 


c 

-+-< 

< 

'U 


Table  3.4. 3. 2  Di\'S  and  PFl-  load  ranges  for  the  througliput  plateaux 


137 


OT 

Q. 

X 

X 

X 

•  < 

^  o 
Cr 

aj  “^. 

C.  o 

&- 

5 

o-^ 

c6 

• 

O 

vX> 

• 

06 

o 

• 

ns 

:r 

Lo 

? 

• 

i>»9 

La 

o— 

vi5 

ft 

A- 

Lo  X 

X 

V 

X 

X 

X 

?  5 

|i)  in 

'>  X 

Ln 

r- 

cr- 

siJ 

Cr* 

o 

pA 

d- 

• 

CA 

nS 

LA 

• 

Or- 

PA 

• 

o 

PA 

r- 

ft 

K(i- 

O 

t>o 

r<\ 

r- 

• 

o 

ro 

» 

• 

O 

Lo 

T^ 

• 

vO 

(V^ 

(V^ 

ft 

&o 

oi 

cr- 

dj 

s 

3- 

• 

rs 

ro 

r- 

• 

tn 

o| 

o6 
( — 
n^ 

?P 

3- 

• 

F 

La 

• 

La 

O 

po 

&o 

nJ> 

r- 

ft 

5=S 

X 

v; 

n> 

>> 

• 

Kyy 

:J- 

.-j> 

• 

o 

o~ 

• 

r- 

o6 

fA 

• 

o6 

O" 

ft 

La 

in 

oi 

vy- 

O 

ft 

^  3 
<^2: 

X 

x  1  '> 

o 

• 

Oo 

OS 

r- 

hn 

» 

ns 

vj) 

• 

C5- 

r- 

• 

O 

n 

tar 

=h 

ft 

fA 

r- 

ft 

TT 

r— 

• 

r- 

1 

n6 

i-n 

tA 

r- 

cr 

• 

Va 

r- 

• 

cr 

• 

r 

Va 

ft 

o- 

r- 

ft 

n- 

La 

to. 

r- 

r- 

• 

P 

3- 

• 

n 

Lo 

« 

Va» 

F- 

• 

o- 

r- 

ft 

Lo 

t3~ 

ft 

La 

cr 

r- 

La 

ro 

Oo 

o/ 

O 

Lo 

f — 
On 

'wO 

=f- 

Ln 

ns 

O 

La 

or 

o- 

-tt 

a»  - 

^  TS 

^  t 
<C  ci 

Or:  c 

o-i  On 

P- 

ns 

p- 

ro 

P- 

•d~ 

p- 

La 

p- 

o^ 

O- 

oo 

P- 

Tabic  3. 4. 3. 3  DWS  lifetime  knee  and  throughput  plateaux  relationship 


138 


s-io 

X 

X 

X 

'> 

?ir 

3~ 

« 

zr 

nJ> 

• 

r* 

60 

re 

• 

« 

fO 

cr 

• 

Co 

Lo 

Lo 

• 

0“ 

Oo 

re 

« 

0- 

• 

ui 

>> 

X 

X 

X 

\y^ 

Co 

'7- 

• 

0 

• 

r- 

r< 

V 

So 

ro 

37 

0 

rr 

• 

ST 

C>o 

Lo 

• 

Lo 

• 

0 

ro 

r* 

• 

X  ^ 

? 

• 

Oo 

• 

00 

• 

0 

cr 

• 

0- 

bO 

• 

0 

re 

r6 

• 

C>o 

SD 

r- 

• 

60 

zr 

n5 

4/lf^ 

CT~ 

r- 

zr 

• 

00 

r- 

t 

re 

• 

» 

?- 

re 

• 

• 

Lo 

0 

* 

CSO 

r- 

• 

#1 

>> 

X 

•>« 

0- 

cr 

• 

0 

0“ 

V.O 

• 

ro 

? 

4 

C><3 

re 

• 

0 

K 

• 

=0 

CO 

k 

Lo 

! 1 

05 

cy 

• 

_ 

CO  3 

<3r-  ^ 

X 

X 

X 

>>• 

Vyo 

00 

or* 

re 

r- 

0- 

0- 

J” 

• 

r6 

0 

re 

r- 

• 

or 

=f 

• 

ro 

n* 

• 

xP- 

2: 

rS 

r- 

• 

r- 

r- 

/OS 

lr> 

fco 

cr" 

fo 

rr 

Ob 

r- 

e: 

to 

CT- 

r- 

Cr- 

TP 

CIS 

« 

G 

• 

Lo 

:S- 

• 

Lo 

5o 

CT*“ 

IT" 

« 

Lo 

« 

Lo 

G 

r- 

• 

V 

-ns 

0 

V-/^ 

l-n 

So 

1 

ro 

"T 

£e 

or 

Pc 

0 

0 

vi> 

Lo 

Lo 

0 

zH 

•a- 

n 

rr 

Lo 

0 

0- 

<L>  — 

«-  -TS 

C 

a/ 

r«> 

pL- 

<v^ 

px- 

rj- 

pu 

Lo 

P- 

px- 

r^ 

a. 

Oo 

CL 

Table  3.4. 3.4  PFT  lifet-ime  knee  and  throughput  plateaux  relationship 


139 


o 

M 

wO 

^  V- 

_  i< 
O 

CT-?r 

X 

'V,. 

X 

X 

x; 

X 

X 

-2^ 

o 

3-^ 

% 

Lr> 

r*^ 

• 

05 

ty¬ 

ro 

• 

zh; 

3l 

lo 

PO 

rl 

• 

Po 

OP 

• 

x* 

X 

X 

X 

X 

X 

X 

X 

3; 

d- 

« 

r-^ 

t/*» 

-T 

« 

OJ 

d“ 

lo 

ro 

O 

O' 

• 

r 

d~ 

» 

2: 

rr 

nS 

r- 

« 

r«^ 

d“ 

« 

<y- 

zt 

• 

Vo 

* 

ro 

rO 

• 

r- 

% 

V-O 

Uo 

Vo 

OS 

o 

ro 

(V^ 

• 

VjO 

* 

Vo 

r- 

«r- 

Oo 

o- 

Vo 

d; 

05 

r< 

• 

3“ 

o- 

V/> 

II 

-u 

o  5 
tr  ^ 

X 

X 

• 

§ 

c- 

Cxa 

OS 

cr* 

d~ 

« 

5^ 

>~c> 

o 

v»o 

r- 

r~ 

vD 

w 

VvO 

o  Q> 

-1 

X 

X 

X 

X 

r- 

rvo 

<5 

c-tS 

L/^ 

• 

0“ 

3: 

r- 

• 

ro 

Oo 

3". 

o- 

• 

2:h 

O 

r- 

r;; 

Ln 

r- 

•o 

r- 

H> 

lo 

r- 

Oo 

iS 

c?- 

r^ 

Oo 

oS 

n/ 

• 

ro 

r^ 

vj 

ro 

p” 

• 

:3, 

zl; 

<3- 

r* 

« 

'  ^ 
o 

f 

r- 

>0 

\rt 

o 

ro 

Oo 

ft/ 

to 

V-O 

cS 

O 

SJP 

Vo 

o 

m 

r- 

Oo 

Oo 

ro 

4-  ~ 

o 

CTO  *- 
>  Ql 

cr  Cl- 

^  cl. 

7+-' 

P- 

0- 

zt- 

Ou 

lo 

O- 

Ck_ 

1 — ' 
p- 

Oo 

p- 

Table  3.4. 3.5  LRU  lifetime  knee  and  throughput  plateaux  relationship 


140 


much  larger  than  the  knee  lifetime  (for  all  memory 
policies).  A  similar  observation  has  been  made  in  [D13,K1]. 
The  reason  is  that  S  represents  the  system  delay  in 
responding  to  a  transition.  If  S  >>  LT  and  LT  is  at  a  knee, 
then  S  is  long  compared  to  the  phase  times  of  the  program, 
and  the  load  control  can  r.o  longer  adapt  in  a  timely 
fashion,  PFF  does  better  than  DWS  in  this  case,  because  its 
tendency  to  overshoot  at  transitions  gives  it  a  larger 
resident  set  and  insulates  it  from  this  problem.  Further 
evidence  is  found  in  Figure  3,2.1,  which  shows  that  the  knee 
lifetime  grows  as  the  reference  string  length  grows. 


The  results  of  the  mean 
performance  of  the  knee  crite 
3. 4. 3. 6.  The  observation  is 
beyond  the  primary  knee  lifet 
parameter  values  fixed,  we 
relative  percentage  differenc 
throughput  and  the  maximum 
increases. 


paging  service  time  (S)  on  the 
rion  'are  displayed  in  Table 
that  increasing  S  significantly 
ime,  while  holding  all  other 
akens  the  knee  criterion.  The 
e  between  the  lifetime  knee 
throughput  increases  as  S 


The  relative  percentage 
under  a  lifetime  knee  criteri 
throughput  can  be  deduced  for 
from  Tables  3. 4. 3. 3- 3. 4. 3. 5 
differences  for  DWS  and 
reference  were  comparable,  wi 
difference  for  both  values 
average  relative  percentage  d 
the  DWS  value  for  both  values 


difference  that  the  throughput 
on  differed  from  the  maximum 
the  DWS,  PFF,  and  LRU  policies 
.  The  relative  percentage 
PFF  averaged  over  the  eight 
th  PFF  having  a  slightly  lower 
of  S.  LRU  had  the  highest 
ifference,  approximately  double 
of  S. 


To  summarize  the  results  of  this  section:  the  DWS  knee 
criterion  worked  well,  as  expected.  The  PFF  and  LRrj  knee 
criterion  also  showed  good  correspondence.  The  DWS  and  PFF 
knee  criteria  were  comparable  when  the  average  relative 
difference  between  l<nee  throughput  and  maximal  throughput 
was  considered.  DWS  generally  allowed  a  higher  load  and 
wider  load  ranges  for  plateaux  on  the  throughput  curves  than 
PFF.  The  mean  paging  service  time  (S)  affected  all  our 
results.  Increasing  the  value  of  S  well  beyond  the  knee 
lifetime  ruined  the  knee  criterion.  In  other  words,  load 
controllers  should  ensure  that  program  lifetime  values  do 
not  become  small  with  respect  to  mean  imaging  service  time, 
{See  [D13]  for  further  discussion.) 


s 

|CV€ 

Knee 

TP 

Mclx 

TP 

Mfc;,  TP 

90  7p 
H(w  TP 

Rel. 

pare. 

diW. 

•74^ 

.74<1 

.VI 

/ 

.4<iJ 

y 

0 

10 

ao 

ir 

.ni 

.  (?I4 

X 

•S’?! 

X 

/Ul 

.3« 

.W 

•TO 

X 

•W 

X 

illH 

ao 

.3l?4 

.3?7 

.34? 

X 

•39? 

2470 

Table  3. 4. 3. 6  Effect  of  S  on  the  knee  criterion  for 
reference  string  P6  and  the  DWS ’policy 


142 


3 . 5  Moa£  abQHX  contholler  criteria 

We  have  emphasized  the  need  for  a  load  controller  to  be 
associated  with  the  memory  management  policy.  The  DWS 

policy  controls  the  load  indirectly  via  the  setting  of  its 
parameter  (window  size) .  The  setting  is  made  so  as  to 
observe  locality  at  some  identifiable  level,  Une  can 

conceive  of  changing  the  DWS  window  size  so  as  to  adapt  to 
the  changing  locality  conditions  in  the  program.  Smith 
proposes  (in  effect)  to  reduce  window  size  at  transition 
time,  thereby  removing  unreferenced  pages  of  the  old 
locality  set  and  damping  increases  in  working  set  size 
[S14].  Prieve  assigns  a  different  (fixed)  window  size  to 
each  page  of  a  program  in  his  Page  Partition  replacement 
policy  and  shows  that  it  improves  in  fault  rate  at  the  same 
mean  memory  size  over  DWS  [PI  ],  The  difficulty  with  these 

two  methods  is  that  the  improvements  (at  least  on  the 


a 

vera 

ge) 

a 

re  not 

sign 

if 

ica  nt 

en 

ou 

gh  to  j 

u 

St 

if 

y 

the 

cost 

of 

i 

mple 

me  nt 

a 

tion . 

T 

here 

are  o 

ther 

me 

t  h 

ods 

wh 

ic 

h  ad  jus 

t 

t 

he 

w 

indo 

w  s  ize 

to 

c 

ont  r 

ol  th 

e  pagi 

ng  r 

at 

e . 

The 

Tenex 

s 

ys 

te 

ra 

is 

said 

to 

m 

on  it 

or 

t 

he  local 

pa 

ge 

fa 

uit 

r 

at e  of 

a 

P 

ro 

gr 

am , 

modify: 

Lng 

t 

he  w 

indo 

w 

size 

d  ynam 

ical 

ly 

i  n 

a 

n  a  t  te 

m 

Pt 

to 

CO 

n  trol 

it 

[312],  Denning  and  Kahn  show  that,  in  certain  cases, 
setting  a  program's  window  size  to  cause  the  average  system 
lifetime  value  to  approximate  the  average  paging  service 
time  will  approximate  maximal  system  throughput  in  a 
gueueing  network  model  [0121,  There  is,  however,  little 
system-level  measurement  data  supporting  these  proposals. 
Indeed,  K.C.  Lynch,  in  a  private  communication  (Spring  1976) 
describes  measurements  of  TENEX  in  which  the  memory  space- 
time  versus  window  size  T  curve  was  found  to  have  a  wide 
flat  minimum,  being  within  5  percent  of  minimum  from 
T-=100  ms  to  T=5  sec,'  A  controller,  which  adjusted  T  to  seek 
a  "more  favourable"  fault  rate,  kept  it  in  this  range  -  but 
had  no  discernible  effect  on  throughput.  The  controller  in 
effect  generated  random  T- var ia t ion,  overhead,  and  little 
more , 

Without  question,  the  simplest  possible  action  is  to 
hold  the  window  size  constant  for  the  duration  of  the 
program;  either  a  global  system  value  or  a  program-dependent 
value  is  assigned  to  the  program  at  its  initiation  and  never 
changed.  The  T-roodifying  proposals  are  based  on  assumptions 
that  controlling  the  window  size  during  program  execution 
may  be  beneficial  and  improve  performance.  However,  there 
is  some  question  whether  dynamic  window  control  is  of  any 
use  (as  noted  above),  Chu  and  Opderbeck  observed  a  wide 
space-time  plateau  in  their  curves,  indicating  that  the 
choice  of  the  threshold  parameter  value  was  not  critical 
[ C2  ].  According  to  their  data,  it  suffices  for  each  program 


143 


to  be  assigned  its  own  fixed  parameter  setting 
words,  the  essential  (irremovable)  controller  ove 
classifying  a  program  and  selecting  its  prope 
setting.  The  number  of  possible  values  that  can 
is  a  measure  of  irremovable  overhead.  In  some 
one  value  suffices,  whence  there  is  no  overhead. 


0 

ne 

m  ust 

be  con 

cer 

ned  tha 

t 

the  o 

ver 

head 

dy  na 

m 

ic  p 

a  ram 

e 

ter  sett 

ing 

cha  nges 

i 

s 

m 

ore 

th 

an  of 

resu 

1 

ting 

ga 

i 

n  in  pe 

r  f  0 

rmance  f 

S5 

1 

• 

Dy 

nam 

ic  ad 

usef 

u 

1  on 

ly 

i 

f 

its  cos 

t  i 

s  less  t 

ha 

n 

i 

ts 

re  wa  rd , 

the 

ques 

t  ion 

s 

raised 

» 

we  inves 

ti 

gat 

ad 

con 

troll 

more 

full 

y- 

W 

e  consid 

ere 

d  platea 

ux 

f  o 

r  t 

he 

mini 

t  ime 

cos 

t 

a 

n 

d  t  he  a 

sso 

elated  w 

in 

do  w 

si 

zes 

for 

PFF 

memo 

ry 

P 

olic ies , 

The  spa 

ce 

- 

t  i 

me 

plateau 

inte 

r 

est 

b 

ec 

a 

use  the 

ca 

Iculated 

S 

T 

a 

pproxi 

m  ates 

ST*= 

H/TP. 

It 

the  ST  curv 

e  exhibi 

ts 

a 

riar 

row 

pla 

so  w 

i 

11  t 

he 

t 

h 

roughput 

cu 

rve. 

.  In  other 
rhead  is  in 
r  parameter 
be  selected 
cases,  only 

involved  in 
fset  by  the 
justment  is 
Because  of 
er  criteria 
mum  space- 
the  DWS  and 
X  are  of 
the  actual 
teau,  then 


0 

ur 

re 

su  1 

ts 

may  d 

ep 

end 

on  t 

he 

se 

P 

rese 

n  t 

in 

thi 

s  s 

tudy. 

I 

f  w 

e  dis 

CO 

ver 

t 

hat 

a  w 

ind 

ow 

of 

given 

si 

ze 

ga  ve 

(S 

ay) 

c 

ont  r 

ol 

for 

DW 

s. 

we  won 

Id 

no 

t  exp 

ec 

t  t 

s 

ize 

to 

be 

CO 

r  re 

ct  for 

d 

iff 

e  re  n  t 

P 

rog 

p 

rese 

n  ti 

ng 

a 

method 

ol 

ogy 

f  o 

r 

a 

c 

ont  r 

oil 

er 

ove 

r  he 

a  d. 

t  of  pro 
ed  in  ou 
a  1 0  pe 
hat  val 
rams.  I 
ssessing 


gra 

ms 

a 

r 

ex 

pe 

rce 

nt 

1 

ue 

o 

f 

nst 

ea 

d. 

ir 

re 

c  tua 11 y 
r iments 
evel  of 
window 
we  are 
movable 


Our  interest  was  i 
necessary  to  force  each 
operate  within  some  spe 
minimum  space-time  cost, 
percent  plateau  as  il 
results  are  displayed  in 
vertical  line  A  shows 
our  reference  strings  to 
space-time  plateau  (tho 
their  extremes).  One  va 
a  10  percent  level  o 
plateau  for  these  progra 
control  required,  at 
vertical  lines  B1  and  B2 
are 

two  values  during  pro 


50,000, 


least 
and  73,000. 


nee 

d  ed. 

I 

t  is 

val 

ues 

dur 

ing 

tio 

n  bet 

wee 

n  t  h 

e 

resul 

ts 

for 

thr 

ee  t  h 

res 

hold 

9 

to  g 

ive 

a 

nee 

stri 

ngs 

.  At 

f  o 

ur  t  h 

res 

hold 

n  t  he  in 
reference 
cified  ran 
He  s t u d i 
lust  rated 
Figures 
setting  T= 
ope  rate 
ugh  severa 
lue  of  T  i 
f  control 
ms.  Were 
least  two 
(5C,00C  a 
t  necessar 
gram  exec 
is  needed. 


PFF 


10 


te 

rval 

(s)  o 

f  wind 

ow  size 

£t 

ring 

of  o  u 

r  e  n  se 

mble  to 

1  ge 

of 

its  me 

mory  policy’s 

.ed 

a  5 

pe  rce 

n  t  and 

a  10 

i 

n  F 

ig  ure 

3.5.  1 

.  The 

3 

.  5.2 

and 

3.5.3 

.  The 

73 

,000 

will 

cause 

each  of 

on 

it 

s  10 

perce 

n  t  DWS 

1 

-  P2 

,P4,P7 

-  ope 

rate  on 

s 

suff 

icent 

to  es 

tabl ish 

fo 

r  DW 

S  on  t 

he  spa 

ce-t ime 

a 

5 

perce 

nt  le 

vel  of 

va 

lues 

of  T, 

shown 

by  the 

nd 

1  18 

,000  r 

espect 

ively) , 

y 

to  c 

hange 

be  twee 

n  these 

ut 

ion ; 

only 

an 

init ial 

f er ent 

.  PFF 

will 

need  at 

n 

ear 

5,000, 

15,00 

0,  and 

evel  of  control  fo 

r  these 

t 

leve 

1,  PFF 

will 

need  at 

n 

ear 

500,  5 

,000, 

15,000, 

144 


Kesi  ierX 


Figure  3.5.1  5  and  10  percent  plateaux  on  tlis 

space -tirae  curve 


^  10(^000 


145 


Figure  3.5.2  DWS  vuniow  sizes  for  S -percent  and  10  percent  plateaux  on  the 


T — I — T 

HI  C  SJ 


o 

o 

o 


3 


Figure  3.5.3  PPF  thr\.‘Shold  window  sizes  for  5  percent  and  10  percent  plateaux 
on  the  space-tinie  curve 


147 


These  findings  suggest  that  PFF  is  inherently  more 
difficult  to  control  than  DWS  because  it  requires  more 
distinct  parameter  values  to  achieve  a  comparable  level  of 
performance  over  a  set  of  reference  strings,  A  well- 
designed  PFF  controller  is  likely  to  generate  more  overhead 
than  a  well-designed  DWS  controller,  which  may  offset  the 
benefits  of  PFF*s  simpler  implementation.  Indeed,  if  the 
manager  of  a  computing  system  is  willing  to  let  each  program 
operate  under  DWS  within  10  percent  of  its  minimum  space- 
time  cost,  then  a  DWS  controller  will  generate  no  overhead 
because  it  uses  one  parameter  value  (73,000)  for  all 
programs . 

The  anomaly  behaviour  of  PFF,  described  in  Section  2.2.3 
and  Appendix  C,  shows  that  PFF  presents  other  control 
problems  -  varying  the  PFF  threshold  window  size  may  not 
produce  the  desired  changes  in  operating  points.  For 
example,  increasing  the  threshold  window  size  may 
unexpectedly  lead  to  a  smaller  mean  memory  allocation,  or  a 

larger  page  fault  rate,  or  both.  No  such  behaviour  is 

possible  for  DWS  [D15,F6].  The  gap  behaviour  of  PFF, 
documented  in  Appendix  C,  also  means  that  varying  the  PFF 
threshold  window  size  may  produce  undesirable  results.  For 
reference  string  P5,  a  small  increase  in  threshold  window 
size  value  produces  a  small  increase  in  lifetime  value,  but 
a  large  jump  in  mean  resident  set  size.  In  other  words,  a 
small  upward  adjustment  in  the  PFF  parameter  may  cause  some 
programs  to  place  a  sudden  heavy  demand  on  the  memory 
subsystem.  No  such  behaviour  has  been  observed  in  our 

experiments  for  DWS, 

To  summarize:  a  load  controller  on  the  DWS  memory 

policy  contains  less  essential  overhead  than  a  PFF 
controller:  it  consistently  requires  fewer  distinct 

parameter  settings  for  the  program  ensemble  to  achieve  both 
5  and  10  percent  plateaux  in  the  space-time  curves. 

Moreover,  a  DWS  controller  is  more  stable  than  a  PFF 
controller  because  it  is  not  subject  to  anomalies  or  gap 
behaviours. 


148 


CHAPTEfi  FOUE  -  A  STUDY  OF  A  SEMI-HAEKOV  MEMORY  DEMAND  MODELS 


iil  INTRODUCTION 


This  chapter  investigates  the 
models  for  the  memory  demand  of  progr 
DHS  and  PFF  policies.  The  state 
resident  set  sizes,  the  holding  times 
resident  set  size,  and  the  trans 
probabilities  of  transitions  between 
sizes,  once  these  model  parameters  a 
set  sizes  and  page  faulting  rates  can 
parameters  may  be  determined  empi 
deduced  from  data  using  assumptions 
program  behaviour  model. 


utility  of  semi-Markov 
ams  operating  under  the 
s  of  such  models  are 
are  durations  of  each 
ition  matrix  gives  the 
pairs  of  resident  set 
re  known,  mean  resident 
be  calculated.  The 
rically,  or  they  may  be 
about  an  underlying 


Models  of  this  type  have  two  possible  uses:  they  can 
derive  formulae  for  program  memory  demands,  and  they  can  be 
used  to  generate  synthetic  reference  strings.  In  the  latter 
case,  it  may  be  necessary  to  specify  a  micromodel  for  use 
during  the  holding  time  of  a  given  memory  size.  Although  we 
do  present  some  analysis  of  our  models,  our  primary  interest 
in  the  models  is  the  generation  of  synthetic  reference 
strings, 

Semi-Markov  models  for  memory  demand  have  been  studied 
by  others.  They  range  from  locality-based  lodels,  which 
directly  characterize  program  phase/transition  behaviour,  to 
policy-based  models,  which  characterize  the  resident  set 
sizes  generated  by  the  program  under  a  given  memory  policy. 
Denning  and  Kahn  studied  a  locality-based  model  whose  states 
were  locality  sets  and  whose  holding  times  were  phases 
[Dll];  they  compared  the  lifetime  functions  of  synthetic 
reference  strings  generated  by  this  model  against  observed 
lifetime  functions  in  order  to  determine  what  assumptions  on 
phase/transition  behaviour  were  essential.  Starting  from 
the  assumption  that  locality  is  characterized  by  the  LRU 
stack  model,  Chu  and  Operbeck  [C3]  and  Sadeh  [SI]  derived  a 
policy-based  model  for  PFF;  this  model  was  intended  as  a 
mean  of  speeding-up  simulations,  for  possible  changes  in 
memory  demand  need  be  attended  to  only  after  each  phase, 
rather  than  after  each  reference.  The  semi-Markov  model  has 
no  more  descriptive  power  than  the  LRU  stack  model  to  which 
it  is  equivalent.  The  principal  attraction  of  the  locality- 
based  models  is  their  ability  to  capture  directly  the 
phase/transition  behaviour,  while  the  attraction  of  the 
policy-based  models  is  their  simplifying  the  calculation  of 
(resident  set  size,  lifetime)  operating  points. 


Our  objective  was  similar  to  that  of 

Chu/Opder beck/Sadeh,  but  we  wanted  a  policy  model  consistent 


149 


with  a  locality  assumption  more  general  than  the  LflU  stack 
model.  He  decided  to  try  an  approach  whereby  we  obtained 
the  semi-Markov  model  parameters  by  measurement;  by  using 
this  as  a  macromodel  and  coupling  it  with  a  micromodel,  we 
hoped  to  create  a  locality-based  model  similar  to  that  of 
Denning/Kahn  but  more  realistic.  This  apparently  reasonable 
approach  failed  to  produce  a  reasonable  locality-based 
model;  indeed,  it  fared  worse  than  the  pu  stack  model! 
Though  the  investigation  reached  a  disappointing  conclusion, 
the  failure  of  an  apparently  useful  approach  contained 
valuable  lessons  worth  summarizing  here.  Our  experience 
points  the  way  to  approaches  more  likely  to  succeed. 

The  structure  of  the  semi-Markov  models  and  experimental 
observations  about  parameter  values  are  described  in 
Sections  4. 3-4, 5.  The  comparisons  between  the  resulting 
model  and  the  actual  reference  strings  are  given  in  Section 
4.6.  We  discuss  a  better  way  to  approach  the  problem  in 
Section  4.7, 


4i2  G^EEAL  DE^RIPTION 


Our  earlier  discussion  (Section  2.4)  indicated  that 
semi-Markov  models  were  promising  because  they  corresponded 
to  the  intuitive  phase/transition  model  of  program 
behaviour.  The  phase/transition  model  describes  program 
behaviour  at  two  levels  of  detail.  At  the  upper  level,  the 
macromodel,  is  a  description  of  the  program’s  memory 
demands,  holding  times,  and  transitions  between  memory 
demands.  At  the  lower  level,  the  raicroraodel,  is  a 
description  of  the  program’s  reference  behaviour  during 
periods  of  constant  memory  demand. 

For  a  policy-based  program  model,  the  states  of  the 
raacromodel  represent  memory  demands,  e.g.,  state  i  means  the 
resident  set  contains  i  pages.  The  memory  demand  transition 
matrix  M=[M{i, j)]  specifies  the  probability  of  transitions 
between  pairs  of  states.  The  distribution  H(i, x)  gives  the 
probability  that  the  holding  time  in  state  i  is  x  time 
units.  Using  these  ideas,  the  reference  string  generation 
process  can  be  performed  as  follows.  A  configuration  of  a 
semi-Markov  model  at  time  t  is  denoted  by  c(t)  =  (SS,i,rh), 
where  RS  is  the  resident  set,  i  is  the  state  of  the  model  at 
time  t,  and  rh  is  the  residual  holding  time  for  state  i  at 


time  t, 
c(t+1)  = 

The  next 

(RS’,i»,rh*) , 

configura  tion 
where  [D14]; 

at 

time  t+ 1 

is 

1 . 

If  rh=0  at 

t+1,  a  new  state  i’ 

is  entered 

with 

probability  M{i,i*).  A  new  resident  set  fis*  is 
selected  based  on  the  particular  assumptions  of  the 
semi-Markov  model.  A  page  reference  is  made  based 


150 


on  the  particular  assumptions  of  transition 
behaviour.  A  new  holding  time  h(i')  is  determined 
from  H(i',x),  and  rh*  is  set  to  h(i*)*  Thus, 
c(t+^1)  =  (RS  *  ,  i*  ,h  {i  ' ) )  ,  Type  1  configuration 
changes  are  caused  by  the  macroroodel. 

2.  If  rh>0,  i'=ir  RS*=BS,  and  rh*=rh-1.  Using  the 

ffiicromodel,  a  reference  to  some  page  in  the 

resident  set  of  size  i  is  made.  Thus, 

c(t  +  1)  =  (ES,i,rh-1).  Type  2  configuration  changes 

are  caused  by  the  micromodel. 

It  is  important  to  note  that,  because  these  definitions  are 
formulated  around  the  resident  set  size  process  of  a  given 

memory  policy,  page  faults  are  possible  only  at  type  1 

configuration  changes.  When  the  memory  transition  matrix 
and  holding  time  distribution  have  been  defined,  performance 
measures  such  as  the  mean  memory  demand  and  page  faulting 
rate  can  be  calculated. 

The  model  above  is  nontrivial  to  use  because  it  has  many 
parameters:  for  an  n-page  program  with  K  points  in  each 
H(i,x)  distribution,  there  are  n2+n«K  values  required  to 
specify  the  macroroodel;  an  LRUSH  micromodel  would  require  an 
additional  n  parameters  {the  LBO  stack  distance 
frequencies) ,  With  additional  assumptions  on  the  memory 
policy  or  the  forms  of  holding  distribution,  it  is  possible 
to  reduce  the  number  of  parameters.  For  example: 

-  Under  the  DWS  policy,  resident  set  size  transitions 
can  be  only  of  the  form  i->i+ j  where  j=-1 ,  0,  or  +1;  the 
transition  matrix  M  has  nonzero  elements  only  in  the 
tridiagonal  positions  and  can  be  stored  as  three  vectors 
in  3«n-2  storage  cells. 

-  Under  the  PFF  policy,  resident  set  size  transitions 
are  of  the  form  i->j#  where  0<j<i+1;  the  transition 
matrix  M  has  (n2+3«n-4)/2  possible  nonzero  entries. 

-  If  the  holding  time  distributions  H(i,x)  are  assumed 
to  be  the  same  for  all  i  (i. e. ,  H  (x) ) ,  then  n^+K 
parameters,  for  a  K-point  histogram,  are  needed, 

-  If  only  the  means  of  the  H{i,x)  for  each  state  are 
important  (i.e,,  TTci)),  then  n^+n  parameters  are  needed. 

-  If  the  rows  of  the  transition  matrix  H  are  identical 
and  only  the  means  of  the  H(i,x)  are  needed,  then  only 
2«n  parameters  are  needed.  (This  is  similar  to  [D11J, 
but  in  a  different  context.) 

Each  of  these  parameter-reducing  assumptions  introduces 
structure  into  the  model  which  may  distort  the  "true” 


151 


underlying  locality  behaviour.  Note  that  sometimes  it  is 
possible  to  increase  the  number  of  parameters  to  reduce  some 
other  cost.  For  example^  Chu/Opderbeck/Sadeh  increase  the 
number  of  parameters  from  n  (the  number  of  LRD  stack 
distance  frequencies)  to  n^  to  reduce  the  running  time  of 
their  algorithm.  Our  interest  in  parameter  reduction  is 
stimulated  by  the  desire  for  compact  models  of  programs. 


ANALYSIS  OF  A  POLICY  M^EL  CHAIN 


4.3.1  Mean  Memory  Demand 


The  advantage  of  a  policy  model  is  the  ease  of  computing 
memory  demands  (mean  resident  set  sizes)  and  fault  rates  (or 
lifetimes).  The  memory  transition  matrix  M(i^j)  describes 
the  embedded  Markov  c ha  in  of  the  resident  set  size  process; 
it  is  assumed  to  be  irreducible,  aperiodic,  and  homogeneous 
with  all  states  recurrent  nonnull.  Then,  the  equilibrium 
probability  distribution  ?=  (tT  (1 )  ,  . .  .  ,ir(n)  )  ,  which  is  the 
normalized  solution  of 


■ff  =  V«M, 

exists;  ir(i)  is  the  equilibrium  probability  of  observing 
resident  set  size  i,  at  a  time  when  the  resident  set  size 
changes. 


The  mean  memory  demand  is  computed  over  all  virtual  time 
according  to  the  formula 

ffl  =  S.  i*^(i)  , 

where  p(i)  is  the  proportion  of  virtual  time  in  which  the 
resident  set  size  is  i.  Of  K  resident  set  size  changes, 
K»jr(i)  of  them  begin  holding  time  intervals  of  mean  length 
t(  i) ;  thus,  the  fraction  of  time  state  i  holds  is 
proportional  to  h  (i)  •if  (i)  ,  whence 

^(i)  =  kiiiiffjil 


where 


is  a  normalizing  constant. 


152 


4.3.2  DWS  Paging  Rate 


The  states  of  the  serai- Mar kov  chain  for  the  DWS  policy 
model  are  the  possible  working  set  sizes.  Changes  in  state 
for  this  model  represent  changes  in  the  working  set  size, 
which  can  have  the  increments  -1,  0,  or  +1.  A  page  fault 
reference  will  cause  an  increment  of  either  0  or  +1,  whereas 
any  other  reference  will  cause  an  increment  of  either  -1  or 
0.  By  defining  holding  times  to  be  intervals  bounded  by 
page  faults  or  decreases  in  working  set  size,  the  increment 
0  is  unambiguously  associated  with  a  page  fault.  We  denote 
the  states  of  the  chain  by  1,2,...,n  ,  where  n  is  the  number 
of  pages  referenced  by  the  program. 

The  parameters  pf  the  DBS  policy  embedded  Markov  chain 
for  a  given  window  size  T  can  be  determined  by  processing  a 
program  under  DWS,  Every  transition  of  the  form  i  ->  i  or 
i  ->  i+1  is  associated  with  a  page  fault.  The  missing  page 
rate  of  DWS  processing  the  model  with  the  same  window  size  T 
used  to  gather  model  parameters  is 

h(i) 


This  formula  can  be  derived  as  follows.  Suppose  the 
experiment  is  run  for  K  references.  The  total  time  in 
resident  set  size  i  is  ^(i)»K,  by  the  definition  of  p(i). 
During  this  total  time,  there  were  e(i)  transitions  out  of 
state  i  where 

e  (i)  = 

■h{i) 


This  may  be  deduced  by  observing  that 

h  (i)  =  total  time  in  i  =  p  (i)  •  K 
#  of  exits  from  i  e  (i) 


The  total  number  of  hits  (non-faulting  references)  in  state 
i  is  H(i)  =  e  (i)  (i,  i- 1 )  ,  because  an  i  ->  i- 1  transition 

represents  a  hit.  Thus,  the  hit  rate  H  is 


153 


H  =  T  Hill  =  71. 

A  K  X  K 


=  21 

T(i) 


The  previous  foroiula  follows  because  F  =  1  -  H.  This 
argument  shows  that  the  formula  for  H  is  exact  in  terms  of 
the  empirical  definitions  used  to  define  ^(i),  M(i,j),  and 

h{i). 


4.3,3  PFF  Paging  Rate 


The  states  of  the  PFF  chain  are  PFF  memory  sizes 
The  parameters  of  the  embedded  Markov  chain  for 
a  PFF-based  policy  model  are  determined  by  processing  the 
program  for  a  given  PFF  threshold  window  value.  Since  every 
transition  i  ->  for  j  <  i>1*  implies  a  page  fault  (PFF 

changes  resident  sets  only  at  page  fault  times)  ,  the  PFF 
fault  rate  is  easily  specified  as 

2"  -Vlii-  =  — SilLatlil  •  _1_ 

''  17(1)  2-^0)  •”»)  hti) 

i 

=  _ 1 _  • 

site  j)  .It(j) 

I 


This  formula  may  be  derived  in  a  manner  similar  to  that  of 
the  DWS  model  fault  rate.  As  before,  there  are 
e  (i)  =  [  p  (i)  oK  ]/^  (i)  transitions  out  of  state  i  during  the  K 
observed  references.  For  PFF,  every  one  of  these  represents 
a  page  fault;  the  total  number  of  faults  in  state  i  is  then 
e  (i)  and  the  fault  rate  F  is: 


P  =  T  eJil  =  T 

A  X  - 

h  (i)  .K 


=  2,  • 

'*  h(i) 


154 


This  formula  for  the  fault  ra^e  is  exact  for  the  given 
empirical  definitions  of  ^(i)  and  "h  (i)  . 

4.  4  HOLDING  TI^  DISTRIBUTION 


In  a  general  semi-Markov  model,  the  holding  time 
distribution  is  conditioned  on  the  current  state. 
Restrictions  on  the  general  model  are  necessary  to  reduce 
the  number  of  parameters,  as  described  in  Section  4.2.  The 
general  state-dependent  distribution  H(i,x)  may  be 
restricted,  becoming  a  state-independent  distribution  H(x). 
H  (i,x)  may  also  be  restricted  to  a  distribution  described 
only  by  its  mean  jalue  h(i)  (e.g.,  a  geometric  distribution 
with  parameter  1/h  (i) ) .  This  case  can  be  restricted  further 
to  use  a  single  state-independent  mean  value  "h.  These 
restrictions  all  represent  a  significant  reduction  in  model 
parameters.  The  central  guestion  is  whether  these 
approaches  represent  too  great  a  sacrifice  in  realism. 

Th^  method  of  using  a  single,  state-independent,  mean 
value  h  gives  poor  results;  Figure  4.4.1  shows  a  typical 
result.  ^Although  the  PFi^raean  memory  demand  is  similar  for 
both  the  h  method  and  the  h(i)  method,  the  page  fault  rates 
differ  significantly. 

While  gathering  model  parameter  values  we  measured 
empirical  holding  time  distributions.  The  coefficients  of 
variation  for  these  distributions  were  3.49  for  DWS  and  3.61 
for  PFF,  averaged  over  six  reference  strings.  This 
indicates  that  the  holding  times  for  both  policies  must  be 
approximated  by  a  distribution  with  a  large  variance,  such 
as  a  (discrete)  hyperexponential.  An  exponential 
distribution  (whose  coefficent  of  variation  is  1.0)  would  be 
a  poor  approximation  to  an  empirical  holding  time 
distribution.  Note  that  an  LRUSM,  whose  lifetime  intervals 
are  geometrically  distributed,  would  be  assured  of  a  good 
fit  to  observed  performance  if  the  empirical  lifetime 
distributions  were  exponential.  However,  the  LRDSM  does  not 
provide  a  good  fit  [D14,S16];  it  is  thus  not  surprising  that 
empirical  holding  time  distributions  are  not  exponential. 
Ghanem  has  also  observed  this  phenomenon  [G4]. 

Because  the  overall  holding  time  distribution  H (x)  has 
high  coefficient  of  variation,  we  decided  to  use  it  as  the 
(state-independent)  holding  time  distribution. 


155 


Figure  4.4.1  Example  of  the  poor  results  of  assuming 
a  state-independent  mean  holding  time 


156 


4. 5  BEFERESCE  STRISG  GENERATION 

This  section  outlines  our  approach  to  synthetic 
reference  string  generation  using  DHS  and  PFF  policy-based 
serai-Markov  program  behaviour  models.  Appendix  G  gives 
further  details. 

First,  we  must  specify  a  macromodel.  We  measure  the 
memory  transition  matrix  M  (i, j)  for  each  policy  on  an  actual 
reference  string,  using  as  the  policy  parameter  a  value  at 
the  lifetime  knee.  This  operating  point  is  the  one  most 
closely  coupled  with  single-step  transition  behaviour  (see 
Section  3,3) .  During  this  processing,  the  empirical  H  (x) 
distribution  is  measured  as  well. 

Second,  we  must  specify  a  micromodel.  Deterministic  and 
stochastic  forms  are  possible  choices  [Dll],  A 

deterministic  micromodel  generates  a  page  reference 
substring  reflecting  a  memory  demand  of  i  pages  in  an 
orderly  manner.  Denote  the  pages  of  resident  set  i  by 

1.2.. ...1,  During  the  holding  time  of  state  i,  a 

deterministic  micromodel  uses  an  index  pointer  j  on  the  list 
of  pages  of  i  to  select  page  references.  A  cyclic 
micromodel  chooses  the  next  value  of  j  as  (j+1)  mod  (i). 

This  produces  a  reference  substring 

1 .2 . .  . . .1  ,  1 ,2 , . . . ,i, 1 , . , .  ,  which  corresponds  to  reference 

patterns  for  which  LRU  will  generate  one  page  fault  per 
reference  whenever  the  LBU  memory  size  is  less  than  i.  A 

sawtooth  micromodel  sweeps  the  j  pointer  up  and  down  the 
page  list,  producing  a  substring  1 ,2, , ,i,i, . , ,2, 1 ,1, , .  . 
This  corresponds  to  patterns  for  which  LRU  will  be  optimal, 
or  nearly  so  [DIO],  Other  possibilities  exist,  but  were  not 
considered  because  the  validation  experiments  indicated  that 
deterministic  micromodels  are  inferior  to  stochastic  ones. 


A  stochastic  micromodel  generates  a  page  reference 
substring  over  a  given  set  of  i  pages  by  using  a  pseudo¬ 
random  technique  to  specify  the  values  of  the  random 
variable  j  indexing  the  resident  set  page  list.  The  program 
behaviour  models  of  Section  2.4  -  all  stochastic  -  can 
conveniently  serve  as  stochastic  micromodels.  As  global 
models  in  Section  2,4,  these  models  referenced  an  entire 
name  space;  as  microraodels,  they  will  reference  a  subspace 
corresponding  to  a  given  resident  set. 

Third,  we  must  specify  a  representation  of  the  current 
locality  set  for  the  micromodel  reference  generation.  The 
general  approach  is  to  arrange  the  pages  in  a  list,  such 
that  the  first  i  pages  (when  the  state  is  i)  are  the  current 
locality.  The  micromodels  then  move  their  pointers  through 


157 


the  allowed  portion  of  the  list.  The  LRUSM  micromodel 
treats  the  list  as  an  LFU  stack. 

Fourth,  we  roust  specify  a  representation  of  transition 
behaviour.  An  important  question  is  the  amount  of  overlap 
between  the  locality  sets  for  states  i  and  j.  Our  initial 
method  was  to  change  one  page  inside  the  j-portion  of  the 
list  at  transition  time,  by  exchanging  it  with  some  page 
outside  the  j-portion.  This  approach  models  the  single  page 
tault  usually  associated  with  a  resident  set  size  change  in 
a  policy-based  model. 

our  experiments  showed  that  synthetic  reference  strings 
generated  by  this  method  could  not  reproduce  the  observed 
lifetime  function  well.  We  attempted  a  more  sophisticated 
transition  model  based  on  the  observations  in  [Ml ].  For  a 
parameter  p,  we  discarded  p*j  pages  upon  entering  state  j; 
this  seemed  to  account  better  for  disruptive  transition 
behaviour.  (Our  earlier  method  corresponds  to  p=1/j.)  This 
gave  poorer  results  than  the  previous  method. 


MOML  IlSTING  RESULTS 


ie  conducted  a  series  of  experiments  to  generate 
synthetic  reference  strings  from  policy-based  semi-Markov 
program  behaviour  models.  The  experimental  method  comprised 
four  general  steps: 


-  Measure  the  memory  transition  matrix  and  the  overall 
memory  holding  time  distribution  using  the  memory  policy 
with  its  parameter  value  set  near  the  lifetime  knee  of 
an  actual  reference  string.  Select  the  micromodel  and 
transition  generation  method  to  be  used. 

-  Generate  a  reference  string  of  1,299,968  references 
using  the  model  with  the  above  parameters. 

-  Compute  the  lifetime  function  of  the  synthetic 
reference  string  for  various  memory  policies. 


-  Compare  the  triplets  (x,LT (x) , parameter)  against  the 
original  triplets  for  the  actual  reference  string. 


The 

1 if et ime 
load  in 
1 if et ime 


rationale  fo 
function  is 
queueing  n 
behaviour  we 


r  this  method  of 
sufficient  for 
etworks;  if  our 
11,  they  may  be 


testing  is  simple:  the 
characterizing  program 
models  cannot  reproduce 
of  little  use. 


Four  combinations  of  (policy- based)  macromodel  and 
raicromodel  were  tested:  DWS/sa wtooth,  DWS/LRUSM, 


158 


PFF/sawtooth,  and  PFF/LBOSM.  For  comparison,  a  global  LRUSM 
was  included  in  the  tests.  For  each  of  six  actual  reference 
strings  we  generated  five  synthetic  reference  strings 
according  to  these  five  models.  For  each  synthetic 
reference  string,  we  computed  the  DWS,  PFF,  and  VMIN 
lifetime  curves.  For  each  policy,  we  computed  the  relative 
error  between  the  observed  lifetime  value  and  the  synthetic 
lifetime  value  at  selected  grid  points.  Appendix  H  contains 
the  details  of  our  experimental  results. 

We  observed  early  on  that  deterministic  micromodels  are 
an  inadequate  representation  of  actual  reference  behaviour 
during  phases.  Figures  4.6.1  and  4.6.2  give  the  DWS  lifetime 
curves  for  the  DWS  and  PFF  policy  models  with  the  sawtooth 
raicromodel  for  reference  string  P3,  The  other  synthetic 
lifetime  curves  generated  from  deterministic  micromodels 
exhibited  similar  behaviour.  In  general,  deterministic 
micromodels  gave  high  relative  errors  (see  Table  4.6.1)  and 
a  poor  lifetime  fit  near  the  origin.  Whereas  the  standard 
set  of  memory  policy  parameter  values  produced  operating 
points  near  the  origin  for  actual  reference  strings,  the 
same  set  produced  a  cluster  of  operating  points  away  from 
the  origin  for  synthetic  reference  strings.  With  these 
observations,  we  paid  no  further  attention  to  deterministic 
micromodels. 

We  also  observed  that  a  deterministic  holding  time 
distribution,  with  mean  value  Tci)  whenever  the  resident  set 
size  was  i,  gave  poorer  results  than  a  sta te-independent 
empirical  holding  time  distribution.  (As  noted  earlier, 
this  may  result  from  H  (x)  more  properly  representing  the 
coefficient  of  variation.) 

our  third  observation  is  that  all  the  models  were  unable 
to  reproduce  important  lifetime  curve  features,  such  as 
knees  and  convex  or  concave  regions  (see  Appendix  H) .  They 
also  failed  to  reproduce  the  (x,LT (x) , parame te r)  triplets  at 
which  the  data  was  gathered.  Table  4,6.2  summarizes  the 
relative  percentage  difference  between  actual  and  synthetic 
lifetime  values  averaged  over  all  six  reference  strings  for 
the  three  memory  policies.  Based  on  goodness  of  lifetime 
fit,  the  LEDSM  ranks  first,  followed  by  the  DWS  policy 
model,  then  by  the  PFF  policy  model. 

These  observations  lead  to  the  main  conclusion:  the 
extra  complexity  of  the  policy-based  semi-Markov  models  does 
not  help.  The  LEUSM,  which  has  the  fewest  parameters  among 
the  models  tested,  gives  the  best  fit.  Even  so  the  relative 
error  exhibited  by  the  LRUSM  is  poor,  suggesting 
deficiencies  in  the  LRUSM  itself  [S18  ]• 

The  failure  of  policy-based  models  appears  to  derive 
from  the  structure  of  the  policy  macroraodel.  Our  simple 


159 


Lftt(r\e 


Figure  4.6.1  DWS  lifetime  functions  for  the  actual  and 
DWS  deterministic  micromodel  P3  reference 
string 


160 


10  ko  30  q-o  ro  40  ip  ?o  <fo  ipo  no 


Figure  4.6.2  DWS  lifetime  functions  for  the  actual  and 
PFF  deterministic  micromodel  P3  reference 
string 


161 


Rtla.i»v;e  pefceA.ta<^e  d iffeteAc^^ 
o/i  refer  striAo.  T3 

policy 

DWS 

A  ijtic 

'hr\)cro  YnodL<?.C 

PFF 

(jeter-i'siiAis'fic 
'>ri(c.  bovviodeC 

Dws 

i7.33 

4-3.  IH 

PFF 

, 

JO.SL 

3!.^(> 

Vhitvi 

CSi, 

S4d3 

Table  4.6.1  Relative  percentage  difference  between 
actual  and  deteriainisti  c  laicromodel 
synthetic  lifetime  carves 


162 


Memory 

polity 

o 

PFF 

iHodet 

UUSh 

DwS 

W.(,l 

fS.33 

PFt- 

30.10 

VHl^J 

is-.n 

32.412 

ai.a‘T  1 

j 

Table  4.6.2  Summary  of  relative  percentage 
difference  between  actual  and 
synthetic  lifetime  curves 


163 


approach  attempted  to  determine  the  actual  locality 
behaviour  by  directly  observing  a  memory  policy’s  behaviour. 
In  retrospect,  we  see  that  this  can  be  highly  inaccurate. 
The  DHS  policy,  for  example,  cannot  fully  adjust  to  a 
locality  change  with  a  delay  less  than  T.  The  DWS  memory 
transition  matrix  can  only  represent  single-step  changes, 
but  not  the  disruptions  often  associated  with  transitions. 
Similarly,  the  single  holding  time  distribution  H  (x)  cannot 
be  made  to  generate  clusters  of  short  holding  times 
synchronized  with  locality  changes.  Apparently,  much  more 
care  is  needed  in  decomposing  observed  behaviour  into  phases 
and  transitions. 


4.7  A  BETTER  APPROACH 


The  failure  of  the  approach,  which  came  as  a  surprise 
considering  the  large  number  of  parameters  used  in  the 
model,  contained  useful  lessons: 

a)  The  actual  data  needs  complete  decomposition  into 
phase  and  transitions.  The  memory  transition  matrix 
fl(i,j)  should  represent  only  the  transitions  i  ->  j, 
where  i  is  a  resident  set  size  at  the  end  of  a  phase  and 
j  that  at  the  beginning  of  the  next,  but  with 
intermediate  transition  data  omitted.  The  holding  time 
distribution  should  be  decomposed  in  two  parts,  one  for 
phases,  the  other  for  transitions, 

b)  The  Harkov  chain  of  the  policy  model  may  grossly 
misrepresent  actual  locality  transitions,  A  locality 
transition  i  ->  j  appears  in  our  policies  as  a  sequence 
of  single  steps  The  experiments  show  that  a  policy 
model,  which  replaces  a  single  event  with  a  series  of 
steps,  may  fare  poorly, 

c)  The  lack  of  explicit  transition  data  causes  the 
generator’s  internal  locality  set  representation  to  fail 
to  track  the  true  locality  set.  By  thus  misrepresenting 
behaviour  of  locality  set  overlaps,  it  generates  an 
incorrect  interreference  distribution  and,  hence,  an 
incorrect  synthetic  lifetime  function. 

In  sum,  the  failure  does  not  derive  from  the 
phase /transition  concept,  but  from  not  collecting  the  data 
properly.  Remember  that  Denning  and  Kahn  showed  this 
formulation  to  be  fully  capable  of  exhibiting  correct 
lifetime  features,  such  as  multiple  knees  [Dll], 

Kahn  developed  a  phase/transition  decomposition  method 
which  could  be  applied  in  our  context,  for  separately 
parameterizing  a  phase  model  (PM)  and  a  transition  model 


164 


(TM)  [K1].  By  examining  the  string  of  successive  lifetime 
intervals  under  the  DWS  policy,  Kahn  was  able  to  classify 
them  as  being  transitions  or  phases,  depending  on  how  high 
the  local  fault  rate  was.  With  this  method,  we  could 
collect  M(i,j)  information  only  when  i  and  j  are  adjacent 
phase  states.  Similarly,  the  macromodel  holding  time 
information  could  be  collected  for  phases  only,  Kahn  showed 
that  lifetimes  between  transition  faults  were  i,i,d,  with  an 
exponential  distribution,  which  determines  a  transition 
holding  time  once  the  mean  lifetime  in  transition  is 
measured, 

A  reference  string  can  be  generated  by  repeating  these 
actions: 

a)  use  the  PM  to  determine  a  new  state  and  holding  time; 

b)  generate  references  using  a  stochastic  micromodel 

such  as  the  LPUSM; 

c)  employ  the  TM  during  a  transition  to  generate  changes 

in  the  locality  set. 

Me  leave  it  as  a  future  project  to  test  this  approach. 

Having  collected  data  for  representing  behaviour  in  the 
PM  and  TM,  it  is  no  longer  possible  to  calculate  operating 
points  directly  from  the  PM.  However,  we  can  calculate 
ffl=2I*i»^(i)  from  the  PM.  The  larger  the  fraction  of  time 
covered  by  the  PM  as  compared  to  the  fraction  of  time 
covered  by  the  TM,  the  closer  the  quantity  m  will  be  to  the 
mean  memory  demand  of  the  actual  reference  string.  We  can 
compute  a  fault  rate  estimate  (for  the  TM)  as  F  =  E*(f/t), 
where  f  is  the  mean  number  of  faults  during  a  transition,  t 
is  the  mean  transition  holding  time,  and  R  is  the  transition 
rate  from  the  PM: 

R  = _ 1 _ 

21.  TTCi)  *h  (i) 

A 

Consider  K  transitions;  these  take  expected  time 
T  =  (K  ♦Tr(i)  )  •"h  ( i)  to  complete,  and  R  =  K/T. )  Thus  a  good 
estimate  of  operating  points  can  be  achieved  by  recomposing 
the  PM  and  the  TM,  The  estimate  improves  in  proportion  to 
the  degree  to  which  the  program's  behaviour  is  in  fact 
decomposable. 


165 


CHAPTER  FIVE  -  CONCLUSIONS  AND  FUTURE  WORK 

5.1  DISCUSSION 


The  work  of  this  thesis  is  divided  into  three  parts.  We 
have  described  proqram  behaviour  and  its  relation  to 
performance  in  Chapter  ?.  We  have  presented  a  detailed 
study  of  program  behaviour  in  light  of  the  phase/transition 
model  in  Chapter  3.  We  have  described  a  study  of  the 
ability  of  empirically  derived  semi-Markov  policy  models  to 
reproduce  locality  behaviour  in  Chapter  U. 

The  principal  contributions  of  Chapter  2  are  as  follows. 
We  classified  memory  management  policies  according  to  their 
treatments  of  program  locality  and  mul tiprogrammed  load 
control.  We  discussed  fixed  partition  policies  and  three 
variable  partition  policies  (Classes  VI,  V2,  and  V3)  .  The 
Class  V3  or  so-called  working  set  policies  were  shown  to  be 
best  because  they  correlate  memory  allocation  with  program 
behaviour  and  employ  automatic  load  control.  When  the  DWS 
(Denning  working  set)  policy  was  made  to  operate  at  the 
primary  knee  of  its  lifetime  curve,  it  should  behave  near 
optimum,  a  fact  verified  in  Chapter  3.  The  PFF  (page  fault 
frequency)  policy,  a  Class  V3  policy,  was  shown  to  have  two 
undesirable  properties.  The  '*PFF  anomaly”,  proved  in 
Chapter  2  and  observed  experimentally  in  Chapter  3,  was 
characterized  by  nonmonotone  behaviour  of  memory  demand  or 
lifetime  value  with  respect  to  the  policy  parameter.  The 
"PFF  gap  behaviour”,  observed  experimentally,  was 
characterized  by  unexpectedly  large  jumps  in  memory  demand 
or  lifetime  value  with  respect  to  small  changes  in  the 
policy  parameter.  The  DWS  policy  can  be  proved  free  of 
anomalous  behaviour;  there  is  no  empirical  evidence  that  it 
exhibits  gap  behaviour.  An  interesting  future  investigation 
would  study  these  policies  in  a  production  computer  system, 
to  observe  whether  these  undesirable  features  are 
significant  in  practice. 

An  extensive  experimental  investigation  is  reported  in 
Chapter  3.  The  DWS  lifetime  function  was  shown  to  contain 
considerable  information  on  phase/transition  behaviour.  In 
particular,  each  (of  several)  knee  in  a  lifetime  function 
corresponds  to  phase  behaviour  at  some  level  (time  scale)  of 
observability.  The  strengths  of  the  knees  were  related  to 
the  relative  importance  of  nesting  in  the  reference  string; 
we  found  that  the  knees  were  more  likely  to  correspond  to 
nested,  rather  than  disjoint,  locality  sets.  The  slope  of 
the  lifetime  function  to  the  left  of  a  knee  was  determined 
by  the  phase  reference  patterns,  at  the  knee  by  the  single 


166 


transition  behaviour,  and  to  the  right  of  the  knee  by 
aggregated  transition  behaviour.  We  showed  that  comparisons 
between  DWS  and  VMIN  lifetime  functions  could  be  used  to 
estimate  memory  allocation  overshoot  at  transition  time.  He 
further  showed  that  we  could  estimate  irremovable  overshoot 
(for  a  nonlookahead  memory  policy)  and  that  PFF  had  more 
irremovable  overshoot  than  DWS.  PFF  anomalous  and  gap 
behaviours  were  observed  in  practice. 

We  found  strong  correlation  between  lifetime  knees  and 
space-time  minima,  especially  for  the  DWS  policy.  He 
verified  the  near-optimality  of  operating  DWS  at  the 
lifetime  primary  knee  (using  simple  queueing  networks). 
When  the  minimum  control  requirement  of  a  memory  policy  is 
measured  by  the  least  number  of  parameter  values  required  to 
cause  all  members  of  a  program  ensemble  to  be  within  some 
tolerance  of  their  space-time  minima,  PFF  was  found  to  be 
inherently  more  difficult  to  pontrol  than  DWS.  PFF  required 
a  different  parameter  setting  for  each  two  programs.  DWS 
required  just  one  parameter  settinq  to  achieve  a  10  percent 
tolerance  (of  optimum  space-time)  and  two  for  a  5  percent 
tolerance,  across  our  set  of  eiqht  reference  strinqs. 


T 

The 
corr  e 
cor  re 
loca  1 
to  al 
loca  1 
per  fo 
based 
value 
f  rom 
be  ob 
phase 
for  a 
them 


he  prin 

cipal 

con 

policy- 

based 

se 

sponded 

to 

res 

spon  ded 

to 

paq 

i  ty  ma 

cromodels 

lowing 

a  sy 

n  the 

ity  set. 

In 

rmed  po 

orly. 

was 

macro 

model 

s. 

wou  Id 

be  re 

aliz 

more  basic 

loca 

tained 

by  se 

para 

s  a  nd 

trans 

itio 

phase 

model 

an  d 

during 

sy  nth 

e  tic 

tributions  of  Chapte 
mi-Markov  models, 
ident  set  sizes  a 
e  faults,  did  not 
.  They  contain  too 
tic  generator  to 
fact,  a  simple  LR 
better  than  the  DW 
The  semi-Markov  in 
ed  if  the  parameters 
lity  assumptions.  A 
ting  the  program  ref 
ns;  we  would  obtain 
a  transition  modal, 
general  ion. 


r  4  are  as  follows, 
in  which  states 
nd  most  transitions 
prove  useful  as 
little  information 
track  a  program*s 
OSM,  which  itself 
S  or  PFF  policy- 
acromodel's  primary 
could  be  derived 
proper  model  could 
erence  string  into 
separate  parameters 
and  then  recompose 


Our  main  conclusion 
supportable  by  erapirica 
phase/transition  behavio 
characterization.  Futu 
decomposition  techniq 

phase/transition  behaviou 
workload  characteriza t 
phase/transition  activity 
as  I/O  requests. 


is  that  phase/transition  models  are 
1  data.  Failure  to  represent 
ur  properly  may  impede  workload 
re  work  should  investigate 
ues,  minimal  models  for 

r  modes,  use  of  these  models  for 
ion,  and  relations  between 
ana  other  resource  activities  such 


167 


BIBLIOGRAPHY 


The  followiriq  abbreviations  have  been  used  in  this 
b iblio  gra  phy : 

ACH  :  Association  for  Computing  Machinery 

lEEH  :  Institute  for  Electrical  and  Electronics  Engineers 

IFTP  :  International  Federation  of  Information  Processing 

SIAM  :  Society  for  Industrial  and  Applied  Mathematics 

JACM  :  Journal  of  the  ACM 

Comm,  ACM  :  Communications  of  the  ACM 

Comp.  Surv,  :  Computing  Surveys 

J.  Res.  Develop.  :  Journal  of  Research  and  Development 

Sys.  J.  :  Systems  Journal 

SJCC  :  Spring  Joint  Computer  Conference 

FJCC  :  Fall  Joint  Computer  Conference 

J,  Computing  ;  Journal  of  Computing 

A1,  Aho,  A.V.,  P.J.  Denning,  and  J.D.  Ullraan,  Principles  of 
optimal  page  replacement.  JACM  _18,  1  (January  1971) 

80-93 . 


A2.  Arvind,  S.Y,  Kain,  and  E.  Sadeh.  On  reference  string 
generation  processes.  Proc.  Fourth  S_ym£.  on  Op. 
^is.  Princ.  (October  1973)  80-87. 

B1,  Bard,  Y.  A  characterization  of  program  paging  in  a  time¬ 
sharing  environment.  IBM  Research  Report  G320-2083 
(Feb.  1973)  , 


168 


B2.  Bard,  Y.  Application  of  the  page  survival  index  (PSI)  to 
virtual-memory  system  performance.  TBH  J. 
Develo£.  19,  3  (May  1975)  212-220. 

B3.  Baskett,  F. ,  K. M.  Chandy,  R.H.  Muntz,  and  F.G.  Palacios. 

Open,  closed,  and  mixed  networks  of  queues  with 
different  classes  of  customers.  JACM  22,  2  (April 
1975)  248-260. 

B4.  Batson,  A.P,  and  R.E.  Brundaqe,  Measurement  of  the 
virtual  memory  demands  of  Algol-60  programs.  Proc. 
ACM  SIGMETRICS  Sym£.  (Sept.  1975)  121-126. 

B5.  Batson,  A.P.  and  A.w.  Madison.  Measurement  of  major 
locality  phases  in  symbolic  reference  strings. 
Ptoc.  Int.  Syni£.  Comp.  P^f.  Mod.,  Meas.  and  Eva  1.  , 
Cambridge,  Mass.  (March  1976)  75-84. 

B6.  Belady,  L. A.  A  study  of  replacement  algorithms  for  a 
virtual  storage  computer.  IBM  Sys,  J.  5,  2  (1966), 

78-101. 

B7.  Belady,  L.A.  Biased  replacement  algorithms  for 
multiprogramming.  IBM  T.J,  Watson  Research  Center 
Note  NC  697  (March  1967) . 

B8,  Belady,  L.A.  and  C.J.  Kuehner.  Dynamic  space-sharing  in 
computer  systems.  Comm.  ACM  12,  5  (May  1969),  282- 

288. 

B9.  Belady,  L.A.,  E.A.  Nelson,  and  G.S.  Shedler.  An  anomaly 
in  space-time  characteristics  of  certain  programs 
running  in  a  paging  machine.  Comm.  ACM  12,  6  (June 
1969)  349-353. 

BIO.  Belady,  L.A.  and  F.P.  Palermo.  On-line  measurement  of 
paging  behavior  by  the  multivalued  MIN  algorithm. 
IBM  J.  Res.  Deyel.  18,  1  (Jan.  1974)  2-19. 

B11.  Belady,  L.A.  and  R. F.  Tsao.  Memory  allocation  and 
program  behavior  under  multiprogramming.  IBM 
Research  Report  RC  3469  (July  1971). 

B12.  Pobrow,  D.G.,  J.D.  Burchfiel,  D.L.  Murphy,  and  R.S. 

Tomlinson.  TENEX,  a  paged  time  sharing  system  for 
the  PDP-10.  Comm.  ACM  15,3  (March  1972)  135-143. 

B13.  Brandwajn,  A.,  J.  Buzen,  E.  Gelenbe,  and  D.  Potier,  A 
model  of  performance  for  virtual  memory  systems, 
(abstract  only)  .  Pto^I*  ACM  SIGMETRICS  Symp. 
(September  1974)  9. 


169 


D14.  Brawn,  B.S.  and  F.G.  Gustavson.  Program  behavior  in  a 
paging  environment.  Proc.  FJCC  (1968),  1019-1032. 

515.  Brundage,  E.E.  and  A.P.  Batson.  Computational 
processor  demands  of  Algol-60  programs.  £roc.  5th 
ACM  S^ir£.  on  0£.  S^s.  Prin.  (Nov.  1975)  161-  167. 

B16.  Bryant,  P.  Predicting  worlcing  set  sizes.  IBM  J.  Res. 
Develop.  _19,  3  (May  1975)  221-229. 

B17.  Buzen,  J.P.  Queueing  network  models  of 

multiprogramming.  Ph.D.  Thesis,  Harvard  University 
(1971)  . 

B18.  Buzen,  J.P.  Computational  algorithms  for  closed 
queueing  networks  with  exponential  servers.  Comm. 
ICM  J6,  9  (Sept.  1973)  527-531. 

B19.  Buzen,  J.P.  Cost  effective  tools  for  performance 
evaluation.  Proc.  COMPCON  Washington,  D.C. 

(Se  pt .  1  975)  . 

B20.  Buzen,  J.P.  Fundamental  laws  of  computer  system 
performance.  Proc.  Int.  S^m£.  Comp.  Perf.  Mod,, 
Heas.  and  Fval. ,  Cambridge,  Mass.  (March  1976)  200- 

210. 

Cl,  Chamberlin,  D. D. ,  S.H.  Fuller,  and  L. Y.  Liu,  An  analysis 
of  page  allocation  strategies  for  multiprogramming 
systems  with  virtual  memory,  IBM  J,  Res,  Develop 
17,  5  (Sept.  1973)  404-412. 

C2.  Chu,  W.W.  and  H.  Opderbeck.  The  page  fault  frequency 
algorithm.  Proc.  FJCC  (1972)  597-609, 

C3.  Chu,  W. W.  and  H.  Opderbeck.  Analysis  of  the  PFF 
replacement  algorithm  via  a  semi-Markov  model. 
Comm.  ACM  19,5  (May  1976)  298-304. 

C4,  Coffman,  E.G.,  Jr.  and  P.J.  Denning,  Operating  Systems 
Theory.  Prentice-Hall  (1973)  331  pp. 

C5.  Coffman,  E, G.  Jr.  and  T.J.  Ryan  Jr.  A  study  of  storage 
partitioning  using  a  mathematical  model  of 
locality.  Comm.  ACM  15,  3  (May  1972),  185-190. 

C6.  Coffman,  E.G.,  Jr.  and  L.C.  Varian.  Further  experimental 
data  on  the  behavior  of  programs  in  a  paging 
environment.  Comm.  ACM  H,  7  (July  1968)  471-474, 

C7.  Corbato,  F.J.  A  paging  experiment  with  the  MOLTICS 
system,  MIT  Project  Mac  Report  MAC-M-384  (July 
1968)  . 


170 


C8.  Courtois,  P.J,  Decomposabil it y ,  instabilities,  and 

saturation  in  multiprogramming  systems.  Comm.  ACM 
18,7  (July  1975)  371-377. 

C9.  Courtois,  P.J.  and  H.  Yantilborgh,  A  decomposable  model 
of  program  paging  behaviour.  Acta  Informa tica  Vol. 
f,  3  (1976)  251-276. 

D1.  Deniston,  W.R,  SIPE:  a  TSS/360  software  measurement 
technique.  Proc.  ACM  Cgjnf.  (1969)  229-245. 

D2.  Denning,  P.J.  Resource  allocation  in  multiprocess 

computer  systems.  MIT  Project  MAC  Report  MAC-TE-50 
(May  1968). 

D3.  Denning,  P.J.  The  working  set  model  for  program 

behavior.  Comm.  ACM  11,5  (May  1968)  323-333. 

D4.  Denning,  P.J.  Virtual  memory.  Comp.  Sury.  2,3  (September 

1970)  153-190. 

D5.  Denning,  P.J.  Comments  on  a  linear  paging  model.  Proc. 
ACM  SIGMETRIC^  Conf.  (Sept.  1974)  34-48. 

D6.  Denning,  P.J.  The  computation  and  use  of  optimal  paging 
curves.  Technical  Report  CSD-TR-154,  Computer 
Sciences  Dept.,  Purdue  U.  (June  1975). 

B7.  Denning,  P.J.  Program  behavior,  working  sets,  and 

multiprogramming.  Technical  Report  CSD-TR-194, 
Comp.  Sci.  Dept.  Purdue  U.  (June  1976). 

D8.  Denning,  P.J.  ACM  Forum  (reply  to  J.H.  Saltzer) ,  Comm. 
ACM  19,  8  (Aug.  1976)  476-477. 

D9.  Denning,  P.J.,  Y.C.  Chen,  and  G. S.  Shelder.  A  model  for 
program  behavior  under  demand  paging.  IBM  T.J. 
Watson  Research  Center  Report  EC  2301  (December 
1968) . 

DIO.  Denning,  P.J.  and  G.S.  Graham.  Mul tiprogrammed  memory 
management.  IEEE  PEoceedinqs  on  Interactive 
Computer  Systems  Vol.  ^3,  6  (June  1975)  924-939. 

Dll.  Denning,  P.J.  and  K.c.  Kahn.  A  study  of  program 

locality  and  lifetime  functions.  Pcoc.  5th  ACM 
Symp.  on  Op.  Sys .  Prin.  (Nov.  1975)  207-216. 

D12.  Denning^  P.J.  and  K.C.  Kahn.  An  L=S  criterion  for 

optimal  multiprogramming.  Proc.  Int.  Symp.  Comp. 
Perf .  Mod.,  Meas.  and  Eyal.,  Cambridge,  Mass. 
(March  1976)  219-229. 


171 


D13.  Denning,  P.J.,  K,c.  Kahn,  J.  Leroudier,  D.  Potier,  and 
R.  Suri.  Optimal  multiprogramming.  Manuscript  (May 
19  76) ,  to  appear  in  Acta  I nf  o rma  tica . 

D14.  Denning,  P.J.,  J.E.  Savage,  and  J.R.  Spirn.  Models  for 
locality  in  program  behavior.  Dept.  of  Electrical 
Engineering,  Princeton  University,  T.R.  107  (April 
1  972)  . 

D15.  Denning,  P.J.  and  S.C.  Schwartz.  Properties  of  the 

working  set  model.  Comm.  ACM  15,  3  (March  1972) 

191-198.  Corrigendum,  Comm,.  ACM  16,  2  (Feb.  1973) 

122. 

D16.  Denning,  P.J.  and  D. R.  Slutz.  Generalized  working  set 
and  optimal  measures  for  segment  reference  strings. 
Technical  Report  CSD-TR-178,  Computer  Sciences 
Dept,,  Purdue  U.  (March  1976). 

D17.  Denning,  P.J.  and  J.R.  Spirn.  Dynamic  storage 

partitioning.  Proc.  4th  Sjmp.  on  0£,  Sjrs.  Prin. 

(Oct.  1973),  74-79. 

D18.  Denning,  P.J.  and  Tra n-Quoc-Te.  On  the  optimality  of 

working  set  policies.  Technical  Report  CSD-TR-176, 
Computer  Sciences  Dept.,  Purdue  U.  (March  1976), 

D19.  Doherty,  W.J.  Scheduling  TSS/360  for  responsiveness. 
Proc.  FJCC  (1970)  97-111. 

El.  Easton,  M.C.  and  Fagiri,  R.  Cold-start  vs,  warm-start 
miss  ratios  and  multiprogramming  performance.  TBH 
T.J.  Watson  Research  Center  P^eport  RC5715  (Nov. 
1975)  . 

FI,  Ferrari,  D.  An  analytic  study  of  memory  allocation  in 
multiprocessing  systems,  in  [03], 

F2.  Ferrari,  D.  Improving  locality  by  critical  working  sets. 
Comm.  ACM  H,  11  (Nov.  1974)  614-620. 

F3.  Ferrari,  D.  Tailoring  programs  to  models  of  program 
behaviour.  IBM  J.  Res.  Develop.  19,  3  (May  1975) 
244-251 . 

F4,  Fine,  G.?1.,  C.W.  Jackson,  and  P.  V.  Mclsaac.  Dynamic 
program  behavior  under  paging.  P£OC.  ACM  Conf. 
(1966)  223-228. 

F5.  Fogel,  H.H.  The  VMOS  paging  algorithm.  ACM  SIGOPS 
Operating  S;ystem  Review  8,  1  (Jan.  1974)  8-  17. 


172 


F6.  Franklin,  M.A.,  G.s.  Graham,  and  R.K.  Gupta,  Anomalies 
with  variable  partition  paging  algorithms. 
Manuscript  submitted  for  publication  (April  1976). 

F7.  Freiberger,  W.F.,  U.  Grenander,  and  P.O.  Sampson, 
Patterns  in  program  references.  IBM  J.  Res. 
Idevelop.  19,  3  (May  1975)  230-243. 

G1.  Gaver,  D.r.,  P.A.W,  Lewis,  and  G.S.  Shedler.  Analysis  of 
exception  data  in  a  staging  hierarchy.  IBM  J,  Res, 
18  (September  1  974)  423-435. 

G2.  Gelenbe,  E,  A  unified  approach  to  the  evaluation  of  a 
class  of  replacement  algorithms.  IEEE  Trans.  E^C. 
Cz22,6  (June  1973)  611-618. 

G3.  Gelenbe,  E,  and  R,  Mahl  (cds.)  Computer  Architectures 
and  Networks.  North-Holland/Amer ican  Elsevier 
(August  1974). 

G4,  Ghanem,  M. Z,  Experimental  study  on  the  behavior  of 
programs,  IBM  T.J.  Watson  Research  Center  Report 
RC5460  (June  1975). 

G5,  Ghanem,  M.Z.  Dynamic  partitioning  of  the  main  memory 
using  the  working  set  concept.  IBM  J.  Res.  bevel. 

8  (Sept.  1975)  445-450. 

G6.  Ghanem,  M.Z.  Study  of  memory  partitioning  for 
mult i program roing  systems  with  virtual  memory,  IBM 
J,  Res,  Deyei,  19,  5  (Sept.  1975)  451-457, 

G7.  Ghanem,  M.Z.  and  H.  Kobayashi.  A  parametric 

representation  of  program  behavior  in  a  virtual 
memory,  Proc.  8th  Princeton  Cpnf.  on  Info.  Sci. 
and  8ys.  (March  1974)  327-330. 

G8.  Gordon,  W.J.  and  G.F.  Newell.  Closed  queueing  systems 
with  exponental  servers.  Operations  Research  15 
(1967)  254-265. 

HI.  Hatfield,  D. J,  and  J,  Gerald.  Program  restructuring  for 
virtual  memory.  IBjl  Sys.  J,  10,  3  (1971),  168-192. 

H2.  Henderson,  G,  and  J.  Eodr igue2-Rosell .  The  optimal 
choice  of  window  sizes  for  working  set  dispatching. 
Proc.  ACM  SIGMETRICS  Symp.  (Sept.  1974)  10-33. 

11.  IBM.  OS/Virtual  Storage  2  features  supplement.  Form 

GC20-1753  (August  1972). 

12.  IBM.  Systeffi/360  Reference  Data.  Form  GX20-1703. 


173 


J1.  Jackson,  J.B.  Jobshop-like  queueing  systems.  Management 
Science  _10,  1  (Oct.  1963)  131-1  42. 

K1.  Kahn,  K.C.  Program  behavior  and  load  dependent  system 
performance.  Ph. D.  Thesis,  Comp.  Sci.  Dept.,  Purdue 
U.  (Aug.  1976). 

K2.  Kilburn,  T.,  D.B.G.  Edwards,  M.J.  Lanigan,  and  F.H. 

Sumner.  One-level  storage  system.  IRE  Trans.  EC- 
11,2  (April  62)  223-235. 

K3.  King,  W.F.  TIT.  Analysis  of  demand  paging  algorithms. 
££cc.  IFIP  1971  Congress.  485-490. 

LI.  L enfant,  J.  The  delay  network  model  of  program 

behaviour.  in  [03]. 

L2.  Leroudier,  J.  and  D.  Potier.  Principles  of  optimality 
for  multiprogramming.  Froc.  InL»  Sy mp.  Corag.  Per f . 
Mod.  Meas.  and  Eya.1 .  ,  Cambridge,  Mass.  (March  1976) 
211-218. 

L3.  Lewis,  P.A.W.  and  G.S.  Shedler.  Empirically  derived 
micromodels  for  sequences  of  page  exceptions.  IBM 
J.  Res.  Deyelgg.  17  (1973)  86-100. 

L4.  Lewis,  P.A.W.  and  P.C.  Yue.  Statistical  analysis  of 
program  reference  patterns  in  a  paging  environment, 
Ppcc.  IEEI:  Conf .  ,  Boston  (1971). 

Ml.  Madison,  A.W.  and  A.P.  Patson.  Characteristics  of 

program  localities.  Comm,  ACM  19,5  (May  1976)  285- 

294. 

M2.  Mattson,  R. L. ,  J.  Gecsei,  D. R.  Slutz,  and  I.L.  Traiger. 

Evaluation  techniques  for  storage  hierarchies.  IBM 
SJ.S  J.  9,2  (1970)  78-1  17. 

M3.  Morris,  J.B.  Demand  paging  through  utilization  of 

working  sets  on  the  MANIAC  II.  Comm.  ACM  15,  10 

(Oct.  1972)  867-872. 

M4.  Morrison,  J. F.  User  program  performance  in  virtual 

storage  systems.  IBM  Sys  J.  12,  3  (1973)  216-237. 

M5.  Muntz,  R.R,  Analytic  modeling  of  interactive  systems. 

IEEE  Proceedings  on  Inleractiye  Computer  Systems 
lol*  63,  6  (June  1975)  946-953. 

01.  Opderbeck,  H.  and  W.W.  Chu.  Performance  of  the  page 
fault  frequency  algorithm  in  a  multiprogramming 
environment.  Proc.  IFIP  74  Congress  235-241. 


174 


02.  Opderbeck,  H,  and  W.W.  Chu.  The  renewal  model  for 

program  behavior.  SIAM  J.  Computing  4,  3  (Sept. 

1975)  356-374. 

PI,  Prievo,  P.G.  Page  partition  replacement  algorithm. 

Ph.D.  thesis.  Dept,  of  Electrical  Engineering  and 
Computer  Science,  University  of  California, 

Berkeley  (December  1973). 

P2.  Prieve,  E.G.  and  R.S.  Fabry.  Evaluation  of  a  page 
partition  replacement  algorithm.  Technical  Report, 
Bell  Labs,  Naperville,  Illinois  (October  197  3). 

P3.  Prieve,  B.G,  and  R.S.  Fabry.  VMIN  -  an  optimal  variable- 
space  page  replacement  algorithra.  Comm,  ACM  19,5 
(May  1976)  295-297. 

B1.  Rodr iguez -Rosell,  J,  Experimental  data  on  how  program 
behavior  affects  the  choice  of  scheduler 

parameters.  EEoc.  Ird  ACM  Syrmp.  on  Op.  Sxs.  Pr in. 
(Oct,  1  971)  156-  163. 

E2.  Rodriguez-Bosell,  J.  Empirical  working  set  behavior, 
^mm.  ACM  16,  9  (Sept.  1  973)  556-560. 

E3.  Rodri guez-Rosoll,  J.  and  J.-P.  Dupuy.  The  design, 
implementation  and  evaluation  of  a  working  set 
dispatcher.  Comm.  ACM  16,  4  (April  1973)  247-253. 

R4.  Ryan,  T.A.,  Jr.  and  E.G.  Coffman,  Jr.  A  problem  in 
mult  iprograraDied  storage  allocation.  lE^  Tra  ns. 
Comp.  C-23 , 1 1  (November  1974)  1116-1122. 

51.  Sadeh,  E.  An  analysis  of  the  performance  of  the  page 

fault  frequency  (PFF)  replacement  algorithm.  Pr oc. 
ACM  Symp.  on  Op,  Sys.  Pr  in.  (Nov.  1975)  6-13  . 

52.  Sager,  G.R.  Dynamic  storage  allocation  in  a  paged 

virtual  memory.  Computer  Science  Group,  0.  of 
Washington  TR.  72-08-03  (Aug.  1972). 

53.  Sager,  G.R.  Symbiotic  scheduling.  Prpe .  Second  Texas 

on  Computing  Systems  (1  973). 

54.  Saltzer,  J. H.  A  simple  linear  model  of  demand  paging 

performance.  Comm.  ACM  17,  4  (April  1974),  181-185, 

55.  Saltzer,  J.H,  On  the  modeling  of  paging  algorithms,  ACM 

Forum,  Comm.  ACM  19,  5  (May  1976)  307-308, 

56.  Sayre,  D.  Is  automatic  folding  of  programs  efficient 

enough  to  displace  manual?  Comm.  IZf  (Dec, 

1969)  656-660. 


175 


57.  Sekino,  A.  Optimal  allocation  of  memory  space  and 

processor  time  on  multiprogrammed  vir t iial- memory 
computers.  IBM  T.J.  Watson  Research  Center  Report 
RC  4317  (April  1 973) . 

58.  Sekino,  A.  A  note  on  biased  resource  allocation.  IBM 

T.J.  Watson  Research  Center  Report  RC  4391  (August 
1973) . 

59.  Shedler,  G.S.  and  C.  Tung.  Locality  in  page  reference 

strings.  SIAM  J.  Computing  3  (September  1972) 

218-241. 

510.  Shemer,  J.E.  and  G.A.  Sliippey,  Statistical  analysis  of 

paged  and  segmented  computer  systems.  IEEE  Trans. 
15  (December  1966)  855-R63. 

511.  Shemer,  J.E.  and  S. C.  Gupta,  On  the  design  of  Bayesian 

storage  and  allocation  alqoritlims  for  paging  and 
segmentation.  IEEE  Trans.  E^C.  18  (June  1969)  644- 
651 . 

512.  Slutz,  D.R.  and  I.L.  Traiger,  A  note  on  the  calculation 

of  average  working  set  size.  Comm.  ACM  17^  10 

(October  1974)  563-565, 

SI  3.  Smith,  A.J.  Performance  analysis  of  computer  system 
components.  Ph. D.  thesis,  Stanford  University 

(1974) . 

514.  Smith,  A.J.  A  modified  working  set  paging  algorithm. 

IEEE  Trans.  E^C.  (Sept.  1976)  907-914. 

515.  Smith,  J.L.  Multiprogramming  under  a  page  on  demand 

strategy.  Comm,  ACM  10,  10  (Oct.  1967)  636-646. 

516.  Spirn,  J.E.  Program  locality  and  dynamic  memory 

management  Ph.D.  Thesis,  Elec.  Engineering 

Department,  Princeton  University,  (March  1973). 

5 1 7.  Spirn,  J.R.  A  model  for  dynamic  memory  allocation  in  a 

paging  machine.  Proc.  8th  Annual  Princeton  Con f . , 
Elec.  Engineering  Department,  (March  1974). 

518.  Spirn,  J.P.  and  P.J.  Denning.  Experiments  with  program 

locality.  Proc.  FJCC  (1972)  611-621. 

W1.  Weizer,  N.  and  G,  Oppenheimer.  Virtual  memory  management 
in  a  paging  environment.  Proc.  SJCC  (1969),  249- 

2  56, 

W2.  Wilkes,  M.v.  A  model  of  core  space  allocation  in  a  time¬ 
sharing  system.  Proc.  SJCC  (1969)  265-271. 


176 


H3.  Wilkes,  M,V,  Automatic  load  adjustment  in  time-sharing 
systems.  ACM  Wor  ksho  p  on  S^s.  Perf .  Eva  1.  (1971) 

308- 320. 

W4.  Wilkes,  M.V.  Time-sharing  Computer  Systems.  American 
Elsevier,  third  edition  (1975). 

W5.  Wilkes,  M.V,  The  dynamics  of  paging.  Computer  Journal 
(Feb.  1973)  4-9. 


177 


APPENDICES 


These  appendices  contain 
and  experimental  results  of  t 
strings  were  generated  fro 
programs  by  a  trace  monitor, 
monitor.  Appendix  B  describ 
Appendix  C  presents  the  40  1 

eight  reference  strings  and 
D  presents  the  correspondi 
appendices  support  the  conclu 


the  details  about  the  programs 
his  thesis.  Eight  reference 
m  the  execution  of  a  set  of 
Appendix  A  describes  the  trace 
es  the  data  reduction  programs, 
ifetime  curves  obtained  from 
five  memory  policies.  Appendix 
ng  space-time  curves.  These 
sions  of  Chapter  3. 


Appendices  E-H  suppo 
E  describes  the  programs 
parameters.  Appendix  F 
describes  the  synthetic  r 
using  these  models.  T 
and  synthetic  lifetime  fu 


rt  the  work  of  Chapter  4.  Appendix 
used  to  extract  semi-Markov  model 
describes  these  values.  Appendix  G 
eference  string  generation  programs 
he  goodness  of  fit  between  observed 
notions  is  treated  in  Appendix  H, 


178 


APPEN  DIX  A  -  THE  TBACE  MONITOR 


T 

t  hesi 

opera 

seque 

IBM 

monit 

monit 

had  t 


he  computer 

host in  g 

t  he 

re 

fere  nee 

St 

rings 

for 

this 

s  was 

an  IBM 

System/360 

Mode 

1 

67  runni 

ng 

under 

t  he 

TSS 

ting 

system 

.  The 

t  race 

s 

( record 

ing 

s  of 

add 

ress 

nces) 

were  ga 

thered  dur 

ing  t 

he 

s  u  mmer 

of 

1973 

at 

the 

T.  J 

.  Watson 

Research 

Cente 

r. 

Because  the  widely- 

used 

or  ing 

facility  under  TS 

S,  SI 

PE 

[D1], 

wa 

s  a 

sampling 

or  wh 

ich  did 

not  record 

ever 

y 

instruct 

ion 

execution 

,  we 

o  use 

anothe  r 

moni to  r. 

Such  a  monitor  is  the  trace  monitor,  TRAM.  It  functions 
by  taking  over  and  retaining  control  during  program 
execution.  It  accomplishes  this  by  being  more  ’’powerful” 
than  the  resident  TSS  onerating  system.  All  interrupts  are 
handled  by  TRAM;  if  they  are  not  related  to  the  program 
being  traced,  TRAM  passes  them  on  to  TSS.  Thus,  the 
operation  of  tracing  with  TRAM  can  seriously  degrade  system 
performance.  For  our  data  gathering,  the  program  being 
traced  was  run  alone  in  a  TSS  system,  including  TRAM. 

For  each  program  instruction,  TRA^  outputs  to  tape  a  16- 
byte  record  with  the  format  indicated  in  Table  A, 1 .  TRAM 
does  not  record  instructions  executed  in  supervisor  state. 
All  addresses  contained  in  a  TRAM  output  record  are  linear 
name  space  locations  which  can  be  converted  to  page 
references  by  dividing  by  the  page  size,  (The  page  size  for 
the  host  IBM  System/360  Model  67  computer  was  4096.) 
Recording  addresses  allows  more  flexibility  -  e-g.,  for  the 
investigation  of  optimal  page  size. 

The  manner  in  which  page  references  are  derived  from  the 
TRAM  output  trace  tape  is  heavily  dependent  on  the 
System/360  instruction  architecture.  Details  about  the 
method  used  are  described  in  Appendix  B.  We  make  the 
simplifying  assumption  that  no  instruction  or  operand 
address  sits  across  a  page  boundary. 


179 


FIE  LB 

COMTE'^Jr^ 

R 

1  i>\i 

Relocc^tcOTv  i>it  (dwtkys  i) 

TXD 

7  tits  1 

Tcii  k  /4eAtiTi<iP'  j 

1  fcyfe 

(Vot  (j^ai  j 

L 

1  iyte 

L€rtatk  field  ^-byie  1 

-  \  '  1 
(Astrc’ctcbAS )  5 

Op 

1  ()v‘h? 

/ 

i  worc\ 

LOC 

Lvnicai  iixsirvcti^K  aido^sM 

on 

i  word 

LvCjica(  of 

operand  ('for  [ 

c»iitruciit;ns)  i 

OPl 

1  word 

1 

i 

s 

4 

y 

wJff'^r?o*aSMaur:v<s>ac^cu9L.«.-% 

Ltgiccsi  dddreis  of  fn^st 
o'per(\y\(d  (for  ^  or  &-"by+e 

^utroctiorO^  tKe  i 

exadepf?  -of  ike  drconoi  j 
['for  d'dyta  U)stfVct\Of^<;. 

i-Tu'i  irii-  i-rnt-r-ni  1 -TT-imrii~m  fffi—iinu'iTii  "  ~^niriir~Tani»~iiHTii~»'WB  i~  i  ir  « 

Table  A.l  TRAM  output  record  format 


180 


APPENDIX  B  -  THE  DATA  REDOCTION  PROGRAMS 


The  sequencing  of  data  reduction  program  execution  and 
tape  creation  has  been  illustrated  in  Figure  3.2.2.  In  this 
appendix  we  outline  the  principal  procedures  of  each  data 
reduction  program. 

Program  STRIP 

The  output  tape  of  TRAM  contains  more  than  an  address 
reference  string.  The  STRIP  program  reduces  the  TRAM  output 
tape,  producing  a  tape  containing  a  sequence  of  page  names. 
In  particular,  STRIP  extracts  each  valid  address  from  the 
TRAM  tape  and  converts  it  to  a  4096-byte  page  number. 

Each  entry  in  the  TRAM  tape  represents  a  System/360 
instruction.  When  STRIP  processes  one  of  these  entries,  it 
generates  from  1  to  3  page  references.  The  first  is  to  an 


instruction  page;  the  address  is 
record  field  LOG  (Table  A.1). 
instruction’s  first  operand,  if 
contained  in  the  TRAM  record  field 
this  corresponds  to  an  HX,  PS, 


contained  in  the  TRAM 


or 


The 

any; 

OP1. 

SS 


second  is  for  the 
its  address  is 
On  the  System/360, 
format  instruction 
[12].  The  third  page  reference  is  for  the  second  operand, 
if  any;  its  address  is  contained  in  the  TRAM  record  field 
OP2.  On  the  System/360,  this  corresponds  to  an  SS  format 
instruction  [12]. 

Care  must  be  taken  in  the  STRIP  program  to  ensure  that 
the  address  in  field  OP1  is  in  fact  a  main  memory  reference. 
TRAM  makes  an  entry  in  that  field  for  any  RR-ins tr uction  or 
immediate-operand  information;  neither  type  of  entry 
corresponds  to  a  main  memory  reference.  This  means  that 
STRIP  must  check  the  operation  code  (stored  in  field  OP)  to 
determine  whether  0P1  in  fact  references  main  memory.  Some 
of  the  instructions  known  to  STRIP  as  not  referencing  memory 
are  contained  in  Table  B. 1. 

Program  LRU/OPT 


The  LRU/OPT  program  inputs 
calculates  LRU  and  OPT  perfor 
another  tape  in  a  format  more  sui 
variable  partition  memory  policie 
two  main  procedures:  a)  an  LRU 
stack,  counts  LRU  stack  distance 
lifetime  and  space- time  curves 
creates  a  tape  with  entries  (page 
and  a  temporary  disk  file, 
processed  by  b)  the  OPT  procedu 
lifetimes  and  space- time  costs  b 
in  the  file  in  reverse  [M2]. 


the  tape  produced  by  STRIP, 
mance  data,  and  produces 
table  for  processing  by  the 
s.  This  program  comprises 
procedure  maintains  an  LRU 
frequencies,  and  computes 
from  this  data;  it  also 
name,  LRU  stack  distance) 
This  file  is  subsequently 
re,  which  calculates  OPT 
y  examining  the  information 


181 


DtcirvNftl 

vfi-lue 

Oparot/cn 

CodOi 

00  -(.z 

- 

KK  forrK^ii: 

OS’ 

LA 

Load 

oq  -  11 

Bf’aAchts 

rj 

BAS 

Bra  tick  and  Store  | 
{Ht'del  (>7  Cf’dy  )  1 

/3R-IT3 

. 

~ 

BranrKej^  Skirts  j 

m 

LRA 

Load  red  addrt?s  i 

(Mod^l  t7  OAly) 

Table  B.I  Sor^e  of  the  insti'uct ion  references  deleted  by 
STRIP 


182 


The  (page  name,  LRU  stack  distance)  format  of  the 

LRU/OPT  output  tape  facilitates  further  processing  for  other 
policies,  because  it  obviates  searching  an  LRU  stack  for  the 
referenced  page.  Therefore,  we  save  much  effort  which  would 
otherwise  be  duplicated  in  the  three  analyzers  for  the  DWS, 
PFF,  and  VMIN  policies. 

Program  DWS 

The  DWS  program  inputs  the  LRU/OPT  output  tape  and 

calculates  DWS  mean  memory  sizes,  lifetime  values,  and  exact 
and  approximate  space-time  costs  for  a  range  of  window 
sizes.  It  does  so  by  maintaining  an  underlying  LRU  stack 
representing  the  memory  allocations  of  all  window  sizes. 
Pointers  into  the  LRU  stack  specify  the  current  memory  size 
(and  contents)  tor  each  window  size  (see  Figure  B.1) .  For 
each  window  size  T(i),  denote  by  x(i)  the  sura  of  the 
resident  set  sizes  at  each  reference,  by  y (i)  the  sura  of  the 

resident  set  sizes  at  each  page  fault,  and  by  m(i)  the  sum 

of  the  number  of  page  faults.  For  each  reference,  the  E  and 
S  data  structures  must  be  updated  correctly  and  relevant 
data  gathered.  Two  actions  are  taken  at  each  reference,  one 
to  gather  resident  set  size  data  for  all  window  sizes,  and 
the  other  to  gather  page  fault  data  for  all  faulting  window 
sizes.  Suppose  the  reference  at  time  t+ 1  is  (p,d).  We  do 
the  following: 

1.  Determine  pages  leaving  the  window:  For  k=1,...,N,  if  the 

time  of  last  reference  t(£(k))  >  t-T(k),  then  set 

E  (k)  :=  E(k)  -  1.  Tn  any  case,  set  x(k)  :=  x  (k)  +  E  (k)  , 

2.  Collect  page  fault  data:  Scan  E  for  the  largest  i  (call 

it  I)  such  that  E(I)  <  d;  for  k=T+1,.,.,N  set 

E(k)  :=  E(k)  +  1,  ni  (k)  :=  m(k)  +  1,  and 

y  (k)  :=  y  (k)  +  E(k)  . 

3.  Update  the  stack:  place  (p(d),t(1)  (=t-*-1))  on  top  of  the 

stack  and  push  down  intervening  entries. 

At  the  end  of  the  simulation,  the  quantities  of  interest  can 
be  calculated  from  the  gathered  data.  The  mean  memory  size 
is  s(T(i))  =  x(i)/L  and  the  lifetime  is 

LT(T{i))  =  L/m(T(i)),  where  L  is  the  reference  string 

length.  The  exact  space-time  product  is 

ST(T(i))  =  x(i)  +  A2«y(i),  where  A2  is  the  auxiliary  memory 
access  speed;  the  approximate  space-time  is 

ST*  (T  (i) )  =  X  (i)  •  (1  +  A2)  . 

The  search  in  step  2  need  not  go  farther  than  the  last 
window  size  which  does  not  fault  (if  the  search  is  in 
increasing  window  size),  because  of  the  memory  inclusion 
property  of  the  DWS  window  sizes  [D15,F6].  Denning  and 
Slutz  have  devised  a  more  efficient  counting  procedure  if 


183 


Asst>^^e.  tKot  T,<Ti<...<T^  are  ^i\/eA  window 

Xiipid:  li  of  ■fKe  iory^  I 

u/kert,  r^  IS  (X  pcLje  7\t"kvi)er  and  i;  (+5  LW  stcick 

distance  (i.e.^  (ts  distance  ck  Stt-l)) 


2'^<ft\e  of 

r  /ost 

4.  I 


Figure  B.l  Data  structures  for  the  DWS  analyzer 


184 


only  the  operating  points  for  the  lifetime  function  are 
desired  [D6,D16],  However,  because  the  exact  space-time 
product  was  needed,  the  procedure  described  here  was  used. 

Program  PFF 

The  PFF  program  inputs  the  LRU/OPT  output  tape  and 
calculates  PFF  mean  memory  sizes,  lifetime  values,  and 
space-time  costs  for  a  range  of  threshold  window  sizes.  The 
PFF  data  structures  are  similar  to  the  DWS  data  structures; 
however,  the  PFF  updating  procedures  are  different.  DWS 
step  1  (pages  leaving  the  window)  is  not  present  for  PFF. 
DWS  step  2  (collect  fault  data)  involves,  for  PFF,  a  search 
over  all  k,  to  determine  if  E (i)  >  d.  Updating 

the  E  structure  must  he  reprogrammed  from  the  DWS  version, 
because  PFF  deallocates,  at  page  fault  time,  all  pages  not 
referenced  since  the  time  of  the  last  page  fault.  The  use 
of  the  the  time  of  last  reference  field  in  the  LRU  stack, 
together  with  a  new  data  structure  containing  the  time  of 
the  last  page  fault  for  each  window  size,  allows  this 
calculation  to  be  performed. 

This  added  complexity  is  necessitated  by  the  PFF 
anomaly  behaviour  (see  Appendix  C  and  [ F6  ]) .  While  the  DWS 
pointer  values  maintain  an  ordering  ( E ( 1) <E (2) <. . , <E  (N) ) 
because  of  the  inclusion  property,  it  is  possible  for  the 
PFF  pointer  values  to  lose  this  ordering  property.  Thus, 
the  search  to  collect  fault  data  must  be  exhaustive 
(k=1  ,  .  . .  ,  N)  for  PFF. 

Program  VMIN 

The  VfllN  program  inputs  the  LRU/OPT  output  tape  and 
calculates  VMIN  mean  memory  sizes  and  lifetime  values  for  a 
range  of  forward  window  sizes.  Although  vaiN  is  formulated 
as  a  lookahead  policy,  its  lifetime  function  can  be 
evaluated  without  lookahead.  The  method  used  is  based  on  an 
extension  to  interreference  interval  counting.  We  did  not 
use  the  approximation  s {T) - v  (T)  =  (T- 1)  •m  (T)  but  instead 
programmed  the  exact  method  including  end  corrections 
[  D6,D16  ]. 

We  did  not  gather  VMIN  space-time  data  because:  a)  the 
counting  procedure  was  not  capable  of  producing  it,  and  b) 
this  information  was  judged  to  be  of  marginal  value  anyway. 


185 


Table  B.2  DV/S  mean  mem.ory  sizes  for  the  standard 
windovs'  sizes 


186 


Table  B.3 


PFF  mean  memory  sizes 
threshold  window  sizes 


for  the  standard 


187 


WlAdovJ 

Sl^ 

Mea>\  n^ewry  siht 

PI 

Pi 

P3 

?H  pr 

Pfc 

P7 

n 

3,000 

iod 

...  . 

ns 
• _ — 

(0.0 

! 

il-^  ‘(.3 

Sd 

11 

ii 

10,000 

'*'}  I 

1  ic>  7  [l  *)  rj  oi' 

1  i  )  iC’l.l  {  ^r-A.*Wli  y -V 

1  o-h  i 

^D'O  i 

29,000 

234. 

1 

.'■5.1  |4'f.l  iiO.f, 

!  ?x  1 

>9.7! 

aj.i  1 

90,000  . 

iS.O 

h/  c  Mo  (  h,'  j  1  !i  i 

j  <0  i  f.A.  J  •  «  1  •  '  >  0«  i 

1  ‘id  1 

^37  j 

J-^-2 

li,DCO  Qi,Hl3lDlM}noin-Olll2 

«w>»ri  ■  wr  v~%wezi''9K’mt  gnp-**?  i  »  iuMfcnani  i  ■  iwi  J  in— ■■  iwniiiiw^i^ 

l''t  ('*  hf'i's  i  '1  ')  i  ^  r  M  '  i  I  i  ' j  O'  1  r? !  i •'i n i 

U5^0-0  :  kI.Y  iSHjULl  1 

•-’v  wtj  itwi  ■  r?rt^<nMr-ii‘i»--av*'  t»“rwtanr--i^i.»'^xiv«»r-»4>.  f  n  i  iii  n  ^ii  ■  n  itnwii  inii  iru  ] 

P  iUl  f'  v";T  A  i  'H  '■  f'i  ^  'ir  '■-!  n  a  ?  /  nO  1 

f^v  ^  V.’  V.7  .  V  -C'  I  (  V  . »  !  ■  -  i.  .;  j  i  I  ; .  ,  <  < >  U  t  lyJj) .  j  J  ^  j 


*»»' «''»«a*ara«'raa*  ;  .vMtw:itr*r  <«>>«<’« 


l»«aC'aMMi  aiAK*)^ 


*>  r'T’N  A  r.  r  'i  : '  <  A  * ' '  '■  "  i  'T  r'  t ''  1  1 ^  ( -i  -^  •  •  ! 

j:>0.0u..^'  ^  A  I^-.-  j/yn^/,{.5  | 

Ai*' "«» V 1  •  '  «•! "  ' W-fauflo  '•dnzn^-n.^iM  tr-  ,-«M(at<*ww  m  ■'■<^'»e'acr>ct1«i/*«rr?«*%«*!B’-;V74AWBlflf«JW'VAta*»?»SUWM»K5 

r  A  A  A-'i-'-'  C  ''  '-'J  <'■ .  ”  i  i>  i  '1 3  r  in?  ^  r'  ^  ? 

v?  vv,  Ir  V  !>,/  ;  ^  M  i  ;■>>>;,  :>-v'  'n-lA  '  f  j>.‘'r  i" 


7  rq,  00  0  j  i  -i  0  j  7  i-vo  7  .s  I  :;a.o  i  n  3  noS"  ill-S" ;  iJOr  * 


.-CT*  JW  « 


a  H  4'^J»U  WB  ••!  <'flSil'i«*t.'^4*  •> '  V.-\RUV»« 


Tabl; 


l> 

c 


4 


V741N  raeo.Ti  ip.err.or)'  siacs 
window  sizes 


fox 


the  standax! 


188 


appendix  C  -  THE  OBSERVED  LIFETIME  FUNCTIONS 


This  appendix  comprises  all  the  plots  of  the  lifetime 
functions  of  five  memory  policies  on  the  eight  trace  tapes. 

Figures  C.1-C,16  show  the  lifetime  curves.  The  graphs 
have  been  drawn  with  different  vertical  and  horizontal 
scales,  because  the  various  reference  strings  have 
behaviours  on  widely  different  space  and  time  scales.  The 
maximum  lifetime  possible  for  each  reference  string, 
corresponding  to  reference  string  length  divided  by  number 
of  distinct  pages  referenced,  is  indicated  on  each  graph. 


The  odd-numbered  figures  of  the  set  of  graphs  contain 
the  lifetime  functions  for  OPT,  LRU,  and  DWS;  the  even 
numbered  figures  contain  the  lifetime  functions  for  VMIN, 
DWS,  and  PFF.  We  graphed  DWS  against  the  fixed  partition 
policy  OPT  to  determine  if  there  were  instances  where  DWS 
improved  upon  OPT.  There  were  in  fact  many  such  instances 
(all  reference  strings  except  P3  and  P7). 


For  the  variable  partition  policy  curves,  the  parameter 
values  producing  the  operating  points  may  be  found  by 
consulting  Tables  B.2-B.4  and  comparing  with  the 
corresponding  graphs. 


An  interesting  DWS  performance  phenomenon  occurred  on 
reference  string  P5.  Two  primary  lifetime  knees  were 
detected,  one  at  mean  memory  size  17  (lifetime  of  16,100) 
and  one  at  42  (lifetime  of  120,400). 

Several  interesting  PFF  performance  phenomena  were 
discovered  in  these  experiments.  One  is  a  PFF  anomaly:  an 
increase  in  the  PFF  window  size  does  not  necessarily  imply 
an  increase  in  the  mean  memory  allocation  or  a  decrease  in 
the  page  fault  rate  [ F6  ],  The  anomaly  with  respect  to 
memory  requirement  was  discovered  in  this  thesis  work  when 
processing  the  first  half  of  reference  string  P2,  A  PFF 
threshold  of  12,000  produced  an  operating  point  (37.24, 
3,234),  whereas  a  PFF  threshold  of  12,500  produced  an 
operating  point  (37.09,  3,250).  This  instance  led  to  the 
generalizations  in  f F6  I .  DWS  and  VEIN  cannot  exhibit  this 
anomaly  because  they  satisfy  an  inclusion  property  with 
respect  to  their  window  sizes  [D15,D16]. 

A  second  phenomenon  is  the  PFF  behaviour.  It  may  be 

seen  in  Figure  C.  10.  All  PFF  parameter  values  tested  in  the 
threshold  range  {7,000,  12,000)  produced  the  single 

operating  point  (50.66,  4,727),  whereas  a  threshold  of  6,500 
produced  the  point  (30.01,  3,467)  and  a  threshold  of  13,000 
produced  the  point  (108.43,  6,599).  Corresponding  to  the 
lower  gap,  a  7.69  percent  increase  in  threshold  window  size 


189 


produced  a 

68. 

81 

and 

a  36, 

36 

pe 

u  ppe  r 

gap. 

an 

8.  3 

produ 

ced  a 

114 

.03 

and 

a  39. 

5  9 

pe 

corre 

s  pond i 

.ng 

inc 

a  re  8 

.  85  pe 

rce 

n  t 

u  pper 

gap. 

The 

small 

increase 

of 

incre 

a  se 

in 

corre 

spondi 

nql 

y  1 

time 

cost ) 

♦ 

Th 

manag 

emen  t 

bee 

aus 

s  mall 

inc  reme 

n  t 

Such 

behavi 

ou  r 

ha 

percent  increase 
rcent  increase 
3  percent  increa 
percent  increas 
rcent  increase 
reases  in  space- 
for  the  lower  qa 
gap  behaviour  o 
threshold  windo 
mean  memory 
arge  increase  in 
is  behaviour  c 
e  the  large  incr 
in  PFF  parame 
s  not  been  obser 


in  mean  memory  allocation 
in  lifetime  value;  for  the 
se  in  threshold  window  size 
e  in  mean  memory  allocation 
in  lifetime  value.  The 
time  cost  (see  Figure  D.  10) 
p  and  36,67  percent  for  the 
f  PFF  is  characterized  by  a 
w  size  producing  a  large 
allocation  (without  a 
lifetime  value  or  space- 
an  give  trouble  in  memory 
ement  in  memory  demand  by  a 
ter  may  generate  thrashing, 
ved  for  either  DWS  or  VUIN. 


We  see 
behaviour  or 
control  of  a 


from  this  discussion  that  either 
the  gap  behaviour  for  PFF  makes 
PFF  policy  difficult  (see  Section  3 


the 

the 


5). 


anomaly 

stable 


Figure  C.l  PI  lifetime  functions  for  OPT,  LRU,  and  DWS 


191 


iftt 


Lifetirn^ 


Figure  C.2  PI  lifetime  functions  for  V'MIN,  DWS,  and  PFF 


192 


Ufetim€ 


Figure  C.3  P2  lifetime  functions  for  OPT,  LRU,  and  DWS 


193 


Figure  C.4  P2  lifetime  functions  for  VMIN,  DWS,  and  PFF 


194 


Lifet/m€ 


Figure  C.5  P3  lifetime  functions  for  OPT,  LRU,  and  DWS 


•1 


195 


Figure  C.6  P3  lifetime  functions  for  VinN,  DWS,  and  PFF 


196 


Figure  C.7  P4  lifetime  functions  for  OPT,  LRU,  and  DWS 


Figure  C.8  P4  lifetime  functions  for  V'MIN,  DWS,  and  PFF 


198 


Figure  C.9  P5  lifetime  functions  for  OPT,  LRU,  and  DIVS 


Figure  C.IO  P5  lifetime  functions  for  VMIN,  DKS,  and  PFF 


Figure  C.ll  P6  lifetime  functions  for  OPT,  LRU,  and  DWS 


Figure  C.12  P6  lifetime  functions  for  VMIN,  DWS,  and  PFF 


202 


Figure  C.13  P7  lifetime  functions  for  OPT,  LRU,  and  DWS 


203 


Figure  C.14  P7  lifetime  functions  for  VMIN,  DWS,  and  PFF 


204 


205 


LifetiA\e 


Figure  C,16  P8  lifetime  functions  for  V'MIN,  DU’S,  and  PFF 


206 


APPENDIX  D  -  THE  OBSERVED  SPACE-TIHE  COSTS 


This  appendix  comprises  the  plots  of  the 
curves  of  four  memory  polcies  (VfilN  excluded)  on 
trace  tapes. 


space-time 
the  eight 


Figures  D.  1- 
Appendix  Q,  comm 
vertical  or  for 
f igures  of  the  se 
for  OPT,  LRU,  and 
space-time  costs 
time  performance 
Section  3.2.  For 
parameter  values 
by  consulting  T 
corresponding  gra 


D.16  show  the  space-t 
on  scales  have  not  b 
the  horizontal  axes, 
t  of  graphs  contain  the 
DWS ;  the  even-numbered 


for  DWS  and  PFF, 


The 


measure  for  DWS  has 
the  variable  partition 
producing  the  operating 
ables  B.2-B.4  and  coo 
phs. 


ime  curves.  As  in 
een  used  for  the 
The  odd-numbered 
space-time  costs 
figures  contain  the 
approximate  space- 
been  discussed  in 
policy  curves,  the 
points  may  be  found 
paring  with  the 


For  these  graphs. 
This  value  was  the  same 
Opderbeck  f  C2  ]. 


a  speed  ratio 
one  used  in  the 


A  of  10^  was  chosen, 
study  by  Chu  and 


The  gap  behaviour  exhibited 
been  described  in  Appendix  C. 


by  PFF  in  Figure  D. 10  has 


207 


SpoiC^-ttrtV^  _ 

Cost  (in  10^  ptge*  page  ref.) 


Z.(^ 

7S 

IM  t 
1) 

Zl 
7-1 

?-0  t 

t 

n 

i.t 

(.< 

I.M  - 


1 


Re.ui^.£R.t 

Jet  Jli« 


10  9io  Jo  ^0  St)  4o  10  So  ^0  loo 


Figure  D.l  PI  space-time  costs  for  OPT,  LRU,  and  DWS 


208 


Figure  D.2  PI  space-time  costs  for  DWS  and  PFF 


209 


4 - J - ! - 1 - V— I - i - 4— + - 1 ^ 

I  lb  32  HI  y4  6H  12  3b 


Figure  D.3  P2  space-time  costs  for  OPT,  LRU,  and  DWS 


210 


Space 

ost 


U  - 

1.1  .. 


I.S 


|.S 


(-1 

(.t  *  “ 

l.y  -■ 
IM  ** 
!•>  •’ 
M  " 
l.l  •' 
1.0  •  * 


K€3((ie»\t 
Set  Jiie 

■* - 1 - 1 - 1— H - \ - — ■!■■ - 1 - 1  -  I  > 

X  2m-  31  ^0  9*^  5*4  11  20 


Figure  D.4  P2  space-time  costs  for  DWS  and  PFF 


4 


Spac«‘tifne 

cost 


- -4 - \ - ^ - 1 - W— 4- - . »-■  (■> 

lO  3.0  ^xf  SD  6o  10  SO  <?(?  |oo  i|o 


Figure  D.5  P3  space-time  costs  for  OPT,  LRU,  and  DWS 


212 


Figure  D.6  P3  space- time  costs  for  DWS  and  PFF 


213 


Figure  D.7  P4  space-time  costs  for  OPT,  LRU,  and  DlvS 


214 


H.l 

^.0 

U 

3.1 


lo 
2.? 


/.? 


/o  lo  }o  To  ho  70  ^0  <7o  ioo  lio 


Figure  D.S  P4  space-time  costs  for  DWS  and  PFF 


215 


Figure  D.9  P5  space-time  costs  for  OPT,  LRU,  and  DWS 


216 


IQ  Zo  )g  Ho  St)  6o  T«  &  <fo 


Figure  D.IO  P5  space-time  costs  for  DWS  and- PFF 


217 


Figure  D.ll  P6  space-time  costs  for  OPT,  LRU,  and  DWS 


r'i- 


Figure  D.12  P6  space-time  costs  for  DWS  and  PFF 


219 


Spac€-t(A\e 
t  Cost, 


-4 - \ —  t - \ ^ ^ — t--t— > 

IS'  3(?  77  <?o  |o7  llo  (jr  IJD  IhT 


Figure  D.13  P7  space-time  costs  for  OPT,  LRU,  and  DWS 


220 


I  Jtt  me 

«—  i  — l— t - 1 - 1 - h - i  . !'■ — I— —I - 1— ^ 

IS  3o  '{f  60  IS  *?o  l»s  ao  IJT  (SO  H-S" 


Figure  D.14  P7  space-time  costs  for  DWS  and  PFF 


Figure  D.15  P8  space -time  costs  for  OPT,  LRU,  and  DWS 


222 


Space-tim€ 

cost 


Szt 


4- 


4— '■'■4" 


J  li  Ife  70  tH  W  n  31.  ‘to 


Figure  D.16  P8  space-time  costs  for  DWS  and  PFF 


223 


APPENDIX  E  -  THE  PM^fillER  VALUE  EXTfiACTION  PROGRAMS 

There  are  two  programs  tor  processing  a  page  reference 
string  and  obtaining  the  values  of  parameters  of  the  semi- 
Harkov  policy  model  (Chapter  4). 

Program  XDWS 

This  program  is  a  modification  of  the  DNS  simulator 
described  in  Appendix  B.  It  accepts  a  window  size, 
processes  a  trace  tape  under  DWS  for  that  window  size,  and 
outputs  the  memory  transition  matrix,  the  maximum  memory 
size  allocated,  the  TT  and  g  vectors,  total  holding  time  and 
mean  holding  time  (h(i))  within  a  state,  and  the  empirical 
holding  time  (between  transitions)  distribution. 


Program  XPFF 

This  program  is  a 
described  in  Appendix  B. 

PFF,  as  does  the  XDWS  program. 

The  relative  costs  of  running  these 
compared  with  the  costs  of  running  the  DWS  and 
curve/spacG-t ime  cost  computations  in  Table  E.  1. 


modification  of  the  PFF  simulator 
It  produces  the  same  outputs,  for 


programs  are 
PFF  lifetime 


224 


?R0GRAM 

Cosr 

XDVOS 

i  7.IS 

\t  34-.31  I 


Table  E.l  Relative  costs  of  parameter  value 
extraction  programs  and  data 
reduction  programs 


225 


APPENDIX  F  -  THE  OBSERVED  PARAMETER  VALUES 


Only  six  of  the  eight  trace  tapes  in  our  sample  were 
processed  for  the  policy  models.  Tapes  P2  and  P8  were 
excluded  because  their  knee  lifetimes  were  closed  to  the 
maximum  lifetime;  we  believed  that  these  programs  would  show 
only  minor  changes  in  resident  set  sizes  and  would  not  test 
the  full  power  of  the  modeling  technique.  For  each  of  the 
remaining  tapes,  programs  XDWS  and  XPFF  were  applied  with 
the  parameter  set  to  the  DWS  or  PIT  primary  knee. 

Highlights  of  these  runs  are  given  in  Table  F.l.  There 
is  no  indication  that  PFF  consistently  allocates  a  larger 
maximum  memory  size  than  DWS, 


The  empirical  tf  and 
both  DWS  and  PFF.  To  mea 
percent  of  the  states 
probability  (i.e.,  the 
determined  the  probabili 
For  the  IT  vector  {states 
of  the  ensemble  spent 
states  for  both  policies, 
virtual  time),  the  pro 
time  in  these  states  fo 
Typically,  the  largest 
0.05  for  both  DWS  and  PFF 
vector  is  approximately 
both  DWS  and  PFF  exhibit 
0.74  to  4*10-7. 


vectors  differ  significantly  for 
sure  this,  we  identified  the  20 
of  If  and  ^  that  occupied  the  most 
n/5  most  probable  states) ,  and 
ty  of  being  in  one  of  these  states, 
at  transition  time),  the  programs 
33  percent  of  transitions  in  these 
For  the  f  vector  (states  over  all 
grams  spent  81  percent  of  execution 
r  DWS  and  86  percent  for  PFF. 
entry  in  the  vector  is  less  than 
and  the  smallest  entry  in  the  IT 
0.001.  Entries  in  the  ^  vector  for 
greater  variability,  ranging  from 


Tape  P6  has  an  interesting  feature  under  PFF.  95 
percent  of  execution  time  occurs  between  two  states,  size  36 
(74  percent)  and  size  20  (21  percent) .  Each  state  was 
entered  only  three  times  during  execution. 

The  average  coefficients  of  variation  of  the 
distributions  of  memory  sizes  allocated  during  execution 
were  1.18  for  DWS  and  1.11  for  PFF. 

Discrete  empirical  holding  time  distributions  were 
measured.  The  data  themselves  are  of  little  interest,  but 
the  coefficients  of  variation  for  DWS  and  PFF  were, 
respectively,  3.5  and  3.6  -  this  shows  that  an  exponential 
is  a  poor  approximation  to  the  distributions,  (See  also 
[  G4].) 


226 


Table  F.l  Detailed  observed  parameter  values 


227 


APPENDIX  G  -  THE  SYNTHETIC  REFERENCE  STRING  GENER^ION 

PROGRAMS 

There  are  five  synthetic  reference  string  generation 
programs,  according  to  the  five  combinations  of  macro-  and 
micromodels  noted  in  Chapter  4, 

Program  GDL 

This  program  generates  a  synthetic  reference  string 
according  to  the  DWS  semi-Harkov  policy  model  with  an  LRUSM 
micromodel.  The  program  accepts  as  input  the  following 
quantities: 


-  the 

n 

umber  of 

re 

f  eren 

1,29 

9, 

968) 

-  the 

numbe  r 

of 

d  ist  i 

-  the 

wi 

ndow  siz 

e 

T1 

prod 

the 

wi 

ndow  siz 

e 

T2 

prod 

-  the 

LP 

U  stack 

di 

st 

a  nee 

-  the 

empir ica 

1 

holdi 

hist 

og 

ram  on  2 

6 

da 

ta  po 

-  the 

state  spac 

e 

si 

ze  an 

be 


referenced 


matrix  M  under  DWS 


After  starting  the  reference  process  by  deterministically 
referencing  two  pages,  the  GDL  program  proceeds  through  four 
routines.  The  first  routine  determines  the  successor  state 
NEXT  to  the  current  state  (memory  size)  NOW  by  sampling  the 
NOW  row  of  the  M  matrix  using  a  pseudorandom  number 
generator.  The  second  routine  determines  the  holding  time 
HOLD  within  state  NOW  by  sampling  the  state-independent 
empirical  holding  time  distribution  using  another 
pseudorandom  number  generator.  The  third  routine  generates 
references  within  a  locality;  it  corresponds  to  the 
micromodel.  For  HOLD-1  references,  the  empirical  LRU  stack 
distance  distribution  is  sampled  by  yet  another  pseudorandom 
number  generator  to  produce  references.  However,  the  stack 
distance  distribution  is  normalized  to  D  positions,  where 
D=NOW  if  NEXT>NOW  and  D= NOW-1  otherwise.  There  are  HOLD-1, 
and  not  HOLD,  references  because  a  transition  reference  has 
already  put  the  model  in  state  NOW.  The  fourth  and  final 
routine  generates  a  transition  reference.  If  NEXT=N0W-1,  we 
do  not  simulate  a  faulting  reference  in  the  model  but 
generate  a  reference  NEXT/2  positions  down  in  the  LRU  stack. 
Otherwise,  a  reference  is  generated  to  a  stack  position 
greater  than  NEXT,  simulating  a  faulting  reference.  This  is 
currently  done  somewhat  arbitrarily  by  using  the  ratio 


228 


T2/(iiuraber  of  references)  to  determine  where  beyond  stack 
position  NOW  the  reference  will  come  from.  The  ratio  is  an 
attempt  to  reproduce  interre ference  intervals  which  will 
make  the  synthetic  DHS  point  at  which  the  maximum  lifetime 
is  reached  agree  well  with  the  actual  point.  After 
execution  proceeds  through  the  four  routines,  corresponding 
to  one  phase  and  one  transition  behaviour,  NEXT  is  set  equal 
to  NOW  and  the  four  routines  are  executed  again.  Execution 
terminates  when  the  required  number  of  references  have  been 
generated , 


During  a  phase,  the  LEDSK  m 
stack  distance  greater  than 
corresponding  to  a  page  fault, 
of  generating  a  first  reference 
using  a  pseudorandom  number 
flipping  of  a  biased  coin.  The 
appropriate  number  of  first  re 
reference  string. 


icromodel  does  not  generate  a 
NOW.  At  transition  time 
we  introduce  the  possibility 
to  a  page.  This  is  done  by 
generator  to  simulate  the 
coin  is  biased  so  that  the 
ferences  are  generated  in  the 


The  synthetic  trace 
reduction  programs  descri 
assume  the  tape  format 
Accordingly,  the  GDL 
references  in  this  format 


tapes  are  processed  by 
bed  in  Appendix  B;  these 
(page  name,  LRU 


IS 


program  outputs 


stack 

its 


di 

s 


t he  d 
progr 
stanc 
y  nthe 


a  ta 
aras 
e)  . 
tic 


Program  GDD 

This  program  generates  a  synthetic  reference  string 
according  to  the  DHS  semi-Markov  policy  model  with  a 
deterministic  sawtooth  micromodel.  The  program  accepts  the 
same  input  quantities  as  the  GDL  program,  except  for  the  LRU 
stack  distance  counts,  which  it  does  not  need.  The  GDD 
program  also  has  four  routines  and  is  the  same  as  the  GDL 
program  except  for  the  third  routine  (the  micromodel) ,  The 
sawtooth  micromodel  will  generate  a  page  reference  substring 

1 . 2. .  . . D, D, D- 1 , . , , ,2, 1 , 1 , , . .  .  Except  for  the  first  run, 
this  will  correspond  to  a  distance  substring 

1 . 2. .  . . , D, 1 , 2, . . . , D, 1 , . . .  .  We  assume  that  this  sequence 
also  holds  for  the  first  run  and  thus  are  able  to  generate 
references  from  a  sawtooth  micromodel  very  easily. 


Program  GPL 

This  program  generates  a  synthetic  reference  string 
according  to  the  PFF  semi-Markov  policy  model  with  an  LRUSH 
micromodel.  The  program  accepts  the  PFF  equivalents  of  the 
GDL  input  data.  The  GPL  also  has  four  routines  and  is  the 
same  as  the  GDL  except  for  the  fourth  routine  (the 
transition  reference) .  Every  transition  under  PFF  is  a  page 


229 


fault  and  this  is  reflected  in  the  model  by  simulating  a 
faulting  reference. 

When  the  empirical  holding  time  distribution  HT  is 


input , 

the 

con 

di t ioned 

d istr  ibu ti  on 

s  of 

holding  times  less 

than. 

and 

gre 

ater  than 

or  equal  to 

,  t  he 

PFF  parameter  value 

THRES  H 

can 

be 

computed 

immediately 

using 

the  value  of  Tl. 

Program 

GP 

D 

This  program  generates  a  synthetic  reference  string 
according  to  the  PFF  semi-Markov  policy  model  with  a 
deterministic  sawtooth  microraodel.  The  GPD  program  is  to 
the  GPL  program  as  the  GDD  program  is  to  the  GDL  program. 


Program  GL 

This  program  generates  a  synthetic  reference  string 
according  to  the  LFfl  stack  model.  The  program  accepts  as 
input  the  number  of  references  to  be  generated,  the  number 
of  distinct  pages  to  be  referenced,  and  the  LRU  stack 
distance  counts.  It  uses  a  pseudorandom  number  generator  to 
sample  the  empirical  LRU  stack  distance  distribution  and 
forms  a  (page  number,  LRU  stack  distance)  pair  for  each 
reference.  First  references  are  generated  when  either  a 
stack  distance  corresponding  to  infinity  or  a  stack  distance 
greater  than  the  current  stack  size  is  chosen. 

The  data  flow  for  an  example  of  synthetic  reference 
string  generation  is  given  in  Figure  G.1.  The  relative 
costs  of  the  synthetic  generation  programs  are  given  in 
Table  G,  1 , 


230 


Figure  G.l  Data  flow  for  the  GDL  program 


231 


?R0GRAM 

COST 

gdl 

ta.sG 

6Dt) 

GPL 

GPD 

la^.Go 

GL 

til-33 

Table  G.l  Relative  costs  of  the  synthetic  generation 
programs 


232 


APPENDIX  H  -  GOODNESS  OF  FIT  BETWEEN  ACTUAL  AND  SYNTHETIC 

LIFETIME  CORVES 


The  fits 
and  PFF  polic 
deter  It)  ini  st  ic 
were  observed 


are  considered  only  for  th 
y  models,  both  with  an  LRUS 
microinodel  data  is  given 
visually  to  give  very  poor 


e  LEUSM  and  the  DWS 
H  micromodel.  No 
here,  because  they 
fits. 


The  goodness  of  fit  of  the  models  is  summarized  in  Table 
H.1.  There  are  given  the  relative  percentage  difference 
between  the  actual  lifetime  and  the  model's  lifetime, 
averaged  over  a  set  of  (typically)  15  equally-spaced  grid 
points.  The  LEUSM  is  evidently  the  best,  performing 
progressively  better  under  VMIN,  PFF,  and  DWS.  The  DWS 
policy  model  is  the  next,  best,  averaging  about  6.5 
percentage  points  behind  the  LRUSM.  The  PFF  policy  model  is 
the  worst  of  the  three,  averaging  about  11.3  percentage 
points  behind  the  DWS  policy  model. 


Two  comments  should  be  made  about  the  PFF  entries, 
gap  behaviour  of  string  P5  mak'es  it  impossible  to  define 
set  of  grid  points  at  which  relative  percentage  differe 
calculations  can  be  performed.  The  lack  of  entries  for 
reflects  the  fact  of  all  model- generated  lifetime  cur 
exhibiting  gap  behaviour,  even  though  the  actual 
lifetime  curve  did  not  exhibit  this  behaviour.  Again, 
goodness  of  fit  calculation  was  possible. 

It  is  not  worthwhile  to  present  all  54  graphs  of 
synthetic  lifetime  curves  under  the  memory  polici 
However,  the  complete  set  of  Pl  actual  and  synthetic  cur 
are  given  in  Figures  H.1-H.9.  These  graphs  show  the  typi 
result,  that  all  policies  fail  to  predict  adequately 
flattening  of  the  DWS  and  PFF  lifetime  near  the  maxi 
possible  lifetime.  The  largest  relative  percentage  err 
tend  to  occur  at  small  memory  sizes  and  small  memory  pol 
parameter  values. 


The 

a 

nee 

P6 

ves 

PFF 

no 


the 

es, 

ves 

cal 

the 

mum 

ors 

icy 


Table  H,2  shows  the  relative  percentage  difference 
between  DWS  lifetimes  achieved  by  the  actual  and  synthetic 
curves  at  the  actual  DWS  knee  memory  size.  This  gives  an 
indication  of  the  ability  of  a  policy  model  to  reproduce  the 
behaviour  from  which  its  input  data  was  determined.  We  see 
that  the  DWS  policy  model  and  the  LRUSM  are  comparable  (but 
not  very  good)  in  their  ability  to  duplicate  the  actual  DWS 
lifetime  knee  operating  point,  both  giving  a  relative 
percentage  difference  of  about  23  percent.  The  figure  for 
PFF  is  about  56  percent. 

Table  H, 3  summarizes  the  abilities  of  the  models  to 
reproduce  multiple  knee  behaviour  under  DWS  and  PFF.  (VHIN 
is  not  considered  because  its  knees  are  not  sharp.)  The  DWS 
model  produced  the  same  number  of  knees  as  in  the  original 


233 


hr. 

Str. 

DW5 

pff 

VMlN 

GDI 

GPL 

GL 

GPL 

GPL 

GL 

GPL 

SPU 

GL 

PI 

la 

aa 

10 

8 

aa 

IS 

la 

23 

13 

P3 

ao 

II 

1? 

37 

II 

10 

31 

8 

?H 

18 

4-0 

3.S 

44 

37 

S4 

31 

HO 

PS 

ir 

34 

IS 

UK 

N.A-jN.A- 

13 

2(> 

P£, 

87 

- 1 

HA 

N.A 

38 

^l 

36 

P7 

3<} 

ai 

H 

38 

ao 

18 

at 

12 

10 

N.A.  -  'W&t  aval  kile  • 


Table  H,1  Relative  percentage  differences  between  actual 
and  synthetic  lifetime  curves 


234 


UfetlrAf  fuKC't(OK 


10  lo  >0  Ho  SO  ^0  "10  ^0  <J0  100 


Figure  H.l  PI  lifetime  fit  for  GDL  under  DWS 


235 


LiFetirt\e 


lo  10  3^  to  5*0  60  70  ^0  qo  (00  (10 


Figure  H.2  PI  lifetime  fit  for  GPL  under  DWS 


236 


Figure  H.3  PI  lifetime  fit  for  GL  under  DWS 


237 


Figure  H.4  PI  lifetime  fit  for  GDL  under  PFF 


238 


LifetirA€ 


10  iO  30  <#0  ST)  70  io  RO  (00 


Figure  H.5  PI  lifetime  fit  for  GPL  under  PFF 


10  9.0  30  ‘H)  ^  40  10  fO  qo  IDO 


Figure  H.6  PI  lifetime  fit  for  GL  under  PFF 


240 


Lifeti#vve 


Figure  H.7  PI  lifetime  fit  for  GDI  under  Vl^IN 


241 


Lifetime 


Figure  H.8  PI  lifetime  fit  for  GPL  under  VMIN 


242 


Life^me 


751J0 

7000 


6000 


mo 


2StH) 

2000 

ItOQ 

roo 


lo  Zo  )0  40  SD  60  7t)  ?i?  <?{»  IDO 


Figure  H.9  PI  lifetime  fit  for  GL  under  VMIN 


243 


(ifeirirAt  valve  j 
relative  percervt^ige  di-ftreAce 

u. 

Str. 

{sAte 

Ki 

tnet  LT 

GDU 
LT&t  K1 

GPL 
LTat  M 

GL 

LTat  ICi 

?l 

6,300 

S’,  600 

II 

4,4  00 

3o 

S^IOO 

(4 

20 

1,640 

4& 

3,360 

5ia 

PM 

If 

2,3*60 

‘10 

1,000 

73 

3,r60 

r 

PS- 

^2 

S'JOO 

6,000 

7 

1,410 

G6 

4^lto 

27 

Pt 

20 

IHio 

4  OHO 
4t 

goo 

3,r6p 

ra 

PI 

<iS' 

iw 

4,feoo 

n 

2.^  Goo 
3H 

IS20 

10 

Avg.  ret. 
per-c.  diff. 

3^3 

5-4 

a3 

Table  H.2  Synthetic  reproduction  of  actual  knee 
lifetimes 


244 


>s 

o 

s 

a> 

§ 

> 

U' 

O 

? 

O 

c 

-p 

?FF 

10 

*3- 

lo*' 

LL 

1 

1 

1 

p- 

O 

o» 

(V% 

irwr 

=»■ 

f 

1 

ns 

O 

O 

Oo 

o" 

__  ^o 

_ _  ^ 

CTi 

1 

I 

o 

vP 

<3C 

s 

fca 

a' 

to 

—  1/^ 
to 

Ln 

-  zr 

nt 

cy 

3? 

p-> 
f^  <^ 

L-r 

o 

t/1 

3 

/=> 

-J 

O 

—  r* 

- 

L  o 

o 

\y^ 

I:- 

m 

-  o 

<y . 

J 

f^ 

O 

— 

— 

V-n 

r« 

ni  ,  N 

tA 

—  ^ 

La 

o 

cnS  o' 

La 

{- — “ 

o 

fc-p 

fO 

1 

l"  ^ 

V-A 

eo' 

c^l 

ns 

rs 

rf 

V 

r^ 

r“ 

or 

La 

«J 

eS 

t- 

h  o' 

tt- 

,  V-A 
\o' 

U\ 

f*/* 

^  1 

d 

L 

cr 

d*i 

r^  1 
(i-  1 

:a- 

La 

Cl- 

Table  H.3  Syntlietic  reproduction  of  number  of  lifetime  knees 


245 


on  four  of  the  six  curves.  The  PFF  model  produced  this 
number  twice,  the  LRUSM  once.  For  the  PFF  policy,  the  PFF 
model  performed  the  best,  producing  the  correct  number  of 
knees  on  three  of  the  four  curves.  The  DWS  model  and  the 
LRUSM  both  produced  this  number  twice.  At  least  in  this 
respect,  the  semi-Markov  policy  models  are  clearly  superior 
to  the  LBIISM.  However,  the  operating  points  of  the  knees  of 
the  synthetic  lifetime  curves  did  not  match  the  actual  knee 
operating  points  well. 


